<![CDATA[Terminal.com blog]]>http://blog.terminal.com/Ghost 0.6Sun, 20 Nov 2016 03:02:26 GMT60<![CDATA[Next up for Terminal.com]]>http://blog.terminal.com/next-up-for-terminal-com/469af4dc-d268-4eb6-8ed1-21eedf99198dTue, 23 Aug 2016 01:54:03 GMT


Sunsetting the container cloud to make way for coding education tools

Today marks an important day for Team Terminal. We are sunsetting our public container cloud to make way for an exciting new focus: powering coding education.

Terminal's beginnings

Terminal.com arose from a need to quickly and easily access cloud computing resources and collaborate on large datasets. During my PhD research, I often found myself frustrated with how much time and setup was required to share research outcomes with colleagues. Data-heavy computations had to be run on local servers that couldn’t dynamically scale, and I had to repackage my findings each time I wanted to share them or get feedback. I was spending practically more time setting up my environment than working with my datasets. There had to be a better way.

With the help of my cofounders, I set out to solve this problem by developing a cloud-based OS for the browser, aided by new (at the time, at least) container-based technology. The idea was that you could spin up a Terminal in your browser, based on any server in your stack, preloaded with your desired runtime environment. You could manipulate your data in the Terminal, share your work with colleagues simply by sending them the URL, snapshot and branch your results, and pause or throw out the Terminal when done. Terminals scaled quickly and dynamically based on changing computing needs, saving users money too. You could run Matlab in your browser without huge setup times! We were getting closer to fulfilling my PhD dream of fast, collaborative infrastructure for scientific computing.

Container cloud launch: it's raining possibilities

We released the first version of Terminal in 2014, and soon discovered that the product had many different use cases -- it was great for data science, education, QA cycles, remote team networking, and more. This was exciting, but also left us with a split focus, since the best features for one use case weren’t always as important for the others. For example, data scientists seemed to prioritize startup speed, whereas individual developers cared more about lowering costs and catching bugs before they reached production.

We also discovered over time that Terminal was being used in industries we hadn’t entirely expected, primarily among educators, who valued the collaborative features and minimal setup. Terminal’s collaboration tools were always the most popular, but the end uses remained disparate. We realized that we needed to focus on one major use case to build out the right feature set and make the product truly useful.

Finding our focus: powering coding education

After speaking with our customers via surveys, phone calls, and Q&A sessions (thank you again for all of the valuable feedback), it became clear that Terminal offers unique benefits to technical education providers, and that we have the opportunity to improve student engagement and success by working with schools and teachers. Terminal is a great tool to supplement original teaching content and drive student success. Our web-based IDE allows coding educators to serve content to their students, scaling up with demand and down with holidays; teach collaboratively in real-time or asynchronous classrooms; quickly get students started on a computer science challenge; seamlessly receive assignments for grading; and much more. We’re lucky enough to be working with some of the best technical content providers out there, and we can’t wait until we can share more with you all.

Next steps and more

Doubling down on education, however, means saying goodbye to our public container cloud. We’re a small team of developers, and can’t provide the public cloud with the support and service it deserves while also building out our education-focused product.

We will disable access to the public cloud on September 30, 2016.

All user information and data will be deleted, so please be sure to migrate your information to a new service before then.

If you're interested in learning more about our new focus on coding education, please sign up for updates.

If you wish to request a refund for purchased credits, please complete a credit request form.

We will continue to support our private cloud solutions, so please reach out if you are interested in continuing your service via an enterprise solution.

We hope to make this transition period as smooth as possible for our customers. There are many excellent cloud hosting providers with GPU servers available; here are a few suggested alternatives to Terminal for developing scientific applications in the cloud.

If you need any help with the transition, please don’t hesitate to reach out.

Stay tuned...

We hope you enjoyed using Terminal’s public container cloud, and appreciate your support over the years.

Stay tuned for our next phase!

Thanks,
Dr. Varun Ganapathi
Founder, Terminal.com

]]>
<![CDATA[Ten unsorted Linux tricks: Volume 3]]>http://blog.terminal.com/10-unsorted-linux-tricks-volume-3/5d4601ca-552d-4b13-8e8f-ec4fea3ef16fTue, 12 Jan 2016 12:58:31 GMT Ten unsorted Linux tricks: Volume 3

Welcome to our third post about simple and (perhaps) useful tricks to use with Linux Terminals.

Check out the first and second posts in the series if you missed them!

Unless defined differently, these tricks will work in standard bash with any Terminal base snapshot, or in most of the modern Linux distributions.


Tip 1: Reading Linux man pages, in color

The boring and monochrome way to read a man page is just by using the man command, but you can also install most. Most is a little utility to make the man pages easy to read by adding some colors to them.

Let's install it:

apt-get install most  

(Ubuntu)

yum install most  

(CentOS)

You can also set up your default PAGER to use most when calling man. If you use bash, you can add it to your .bashrc file to make the changes permanent.

export PAGER="/usr/bin/most -s"  
echo 'export PAGER="/usr/bin/most -s"' > ~/.bashrc  

Ten unsorted Linux tricks: Volume 3


Tip 2: Split a large file into small pieces

I remember having to do this many years ago, when still dealing with floppy disks. You can split a file into smaller pieces of a certain size by using the split command.

Check out this example:

root@terminal41657:~ split -a 2 -b 10M big_file small_chunk-  
root@terminal41657:~ du -sm *  
20 big_file.txt  
10 small_chunk-aa  
10 small_chunk-ab  
  • -a 2 means the suffix length is two characters (aa, ab, ac... zz)
  • -b 10M means the size of the chunk is 10M
  • bigfile.txt is the name of the file to be divided
  • small_chunk- is the prefix used in the names of the chunks or parts

Now, if you want to recombine the file, you can just use the cat command:

root@terminal41657:~ cat small_chunk-* > big_again.txt  

Easy, right?


Tip 3: Merge or split PDF files directly in your Terminal

We will use the convert command line utility, which is part of the Imagemagick package. In order to use convert you need first to install the imagemagick package by executing:

apt-get install imagemagick  

(Ubuntu)

yum install imagemagick  

(Centos)

Now that we have Imagemagick, we can merge and split files.

Merging PDF files:

convert part1.pdf part2.pdf merged.pdf  

This command will merge part1.pdf and part2.pdf into one single file called merged.pdf.

You can also merge a subset of pages instead of all pages in the input files:

convert file1.pdf[0-5] file2.pdf[3,7-9] merged.pdf  

In this example, we merge pages from 0 to 5 of file1.pdf and pages 3, 7, 8 and 9 of file.pdf into a single merged.pdf file.

Splitting a PDF file:

This takes certain pages of the input PDF file and copies them into different output files:

convert input.pdf[0-10] pages0to10.pdf  
convert input.pdf[11-20] pages11to20.pdf  

In this example we divide the file in two by taking certain subsets of pages from the same file and saving them in two different output files.


Tip 4: Hide a command from the bash command line history

You may want to do this for different reasons, for instance, when using plain password as commands argument (this is a bad practice, of course!).

To hide a command from the bash history, just leave a blank space at the beginning of the line.

$ mysql -uroot -pMYPASSWORD # This will apear in the bash history.
$  mysql -uroot -pMYPASSWORD # This will NOT apear in the bash history. 

Tip 5: Renaming files in bulk

We will use the rename command to change the name of several files in a batch. Imagine that we have a folder with several hundred photo files, downloaded from your camera. They have names like DSC-XXX.jpg where X is an entire number.

rename -v 's/DSC/birthday-party/' *.jpg  
DSC_001.jpg renamed as birthday-party-001.jpg  
DSC_002.jpg renamed as birthday-party-002.jpg  
DSC_003.jpg renamed as birthday-party-003.jpg  
.
.
.

's/DSC/birthday-party/' is a perl-compatible regular expression.



If you want to replace ALL occurrences of a pattern in a file name, you must specify a global replace. For example, if you want to replace all spaces in the file names with null (remove all spaces):

$ rename -v 's/ //g' *.pdf
Terminal Local T Documentation.pdf renamed as TerminalLocalTDocumentation.pdf  
Curriculum Vitae 2015-09-05.pdf  renamed as CurriculumVitae2015-09-05.pdf  



Now, suppose you want to replace all occurrences of spaces, underscores, and round brackets with the dash character - :

$ rename -v 's/[ _()]/-/g' *
upload_pic (001).jpg renamed as upload-pic-001-.jpg  
upload_pic (002).jpg renamed as upload-pic-002-.jpg  
2001 A Space Odyssey (Novel) Arthur C Clarke.pdf renamed as 2001-A-Space-Odyssey--Novel--Arthur-C-Clarke.pdf  

Tip 6: Sharing your bash history across multiple sessions

By default, your bash session history is saved when your session ends. This was just fine when people worked with only one session at a time. Nowadays it's easy to work with multiple terminal sessions by just opening a new window or, in the case of Terminal, by opening a new tab.

In order to enable this feature, open your ~.bashrc file and add the lines in below:

export HISTCONTROL=ignoredups:erasedups  
shopt -s histappend  
export PROMPT_COMMAND="${PROMPT_COMMAND:+$PROMPT_COMMAND$'\n'}history -a; history -c; history -r"  
  • The first statement is to avoid duplicates in your command history.

  • The second statement appends the history to the file, instead of rewriting it every time.

  • Finally, the third line will set the PROMPT_COMMAND variable with the proper history commands to save the history file, clean the in-memory history stack, and reload it from the file again. This will happen each time the prompt is shown.


Tip 7: List all files that were modified by a command

Sometimes you run commands that you don't know entirely and afterwards you're not sure what was modified in a certain directory of your system.

To avoid that problem, you can execute your command like this:

D="$(date "+%F %T.%N")"; <COMMAND>; find . -newermt "$D"  

Where <command> is what you want to execute.

This simply takes the date (in nanoseconds resolution) and saves its value in a variable, then it executes the command and search the directory for the files that were modified since the date stored in the variable.

Note that this only works on the current directory and its subpaths, but you can change it by extending the find scope to something different from ..

See this trivial example:

root@terminal41651:~ D="$(date "+%F %T.%N")"; mkdir files;  touch files/newfile; find . -newermt "$D"  
.
./files
./files/newfile

Note that I'm executing two commands in this case (mkdir and touch). You can use as many commands as you want, surrounded by D="$(date "+%F %T.%N")"; and find . -newermt "$D"


Tip 8: Show current connections and their details

You can use lsof -i -r to show all connections and their details in realtime.

root@terminal26617:~ lsof -i -r  
COMMAND   PID  USER   FD   TYPE     DEVICE SIZE/OFF NODE NAME  
mysqld    735 mysql    0u  IPv4 3409549032      0t0  TCP localhost.localdomain:mysql->localhost.localdomain:37524 (ESTABLISHED)  
mysqld    735 mysql   25u  IPv4 1121802450      0t0  TCP localhost.localdomain:mysql (LISTEN)  
mysqld    735 mysql   76u  IPv4 3409549033      0t0  TCP localhost.localdomain:mysql->localhost.localdomain:37525 (ESTABLISHED)  
node    12407  root   10u  IPv4 3409549030      0t0  TCP localhost.localdomain:37524->localhost.localdomain:mysql (ESTABLISHED)  
node    12407  root   13u  IPv4 3409549031      0t0  TCP localhost.localdomain:37525->localhost.localdomain:mysql (ESTABLISHED)  
node    12407  root   20u  IPv4 1121802517      0t0  TCP *:http (LISTEN)  
sshd    12980  root    3u  IPv4 1121802518      0t0  TCP *:ssh (LISTEN)  
sshd    12980  root    4u  IPv6 1121802519      0t0  TCP *:ssh (LISTEN)  
===

Tip 9: Generate a random password

LANG=c < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-16};echo;  

In the line above, we take a random string from the /dev/urandom special device, just leaving the permitted characters and cutting it down to 16 characters.

root@terminal41657:~ LANG=c < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-16};echo;  
31uLtZWQN58Y_6tg  
root@terminal41657:~ LANG=c < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-16};echo;  
BJJrGvVdNg_ult2k  
root@terminal41657:~  LANG=c < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-16};echo;  
zyiou-8Zkm5BRXwd  

You can change the value 16 to another number depending on how long your password needs to be.


Tip 10: Use pv to actively monitor data sent through a pipe

With pv you can watch your piped operations by getting information like time elapsed, percentage of completion, throughput rate, and ETA.
Ten unsorted Linux tricks: Volume 3 Let's install it:

apt-get install pv  

(Ubuntu)

wget http://mirror-fpt-telecom.fpt.net/fedora/epel/6/i386/epel-release-6-8.noarch.rpm  
rpm -ivh epel-release-6-8.noarch.rpm  
yum install pv  

(CentOS, adding the EPEL repository)

And some usage examples:

pv 500M.out > copy.out  
500MB 0:00:01 [ 237MB/s] [==>  ] 50%  

Piping the contents of a file into another file

tar -cf - . | pv -s $(du -sb . | awk '{print $1}') | gzip > out.tgz  
 313MB 0:00:05 [53.7MB/s] [>     ]  2% ETA 0:03:59

Getting info about a tar and gzip process

root@terminal41060:~# pv /dev/zero > /dev/null  
143GB 0:00:11 [13.2GB/s] [                  <=>       ]  

See how fast your Terminal copies from /dev/zero to /dev/null

For more information please check the pv man pages.


Final notes

I hope you've enjoyed this list of tricks. All of them will work in Terminals.

Feel free to contribute with your own tricks in the comments section!

]]>
<![CDATA[An Introduction to Statistical Learning in R: Chapter 1]]>

Introduction to the series

Welcome. As a former R user who is out of practice, I decided a great book to work through would be An Introduction to Statistical Learning With Applications in R (which I'll refer to as ISLR from here on) by Gareth James, Daniela Witten, Trevor Hastie,

]]>
http://blog.terminal.com/an-introduction-to-statistical-learning-in-r-chapter-1/aa10c64c-bc2a-4a0d-91bc-872709628e92Thu, 07 Jan 2016 02:25:47 GMT

Introduction to the series

An Introduction to Statistical Learning in R: Chapter 1

Welcome. As a former R user who is out of practice, I decided a great book to work through would be An Introduction to Statistical Learning With Applications in R (which I'll refer to as ISLR from here on) by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani.

It's a well-regarded book, a more application-oriented and less theoretical companion to Elements of Statistical Learning, which is an older and very famous book by Trevor Hastie, Robert Tibshirani, and Jerome Friedman that serves as a guide and reference for dozens of the most popular machine learning algorithms. Thanks to the generosity of the authors and Springer, both books are freely available online (just follow the links).

Unlike its more mathematical cousin, ISLR includes discussion of R code for running algorithms, and the authors provide example code for the labs as well. They also provide a package ISLR on CRAN with most of the datasets, which you can obtain by running install.packages("ISLR").

In this series, I want to provide more discussion around the code examples - and introduce some R tips and tricks the authors don't mention.


Introduction to chapter 1

In chapter 1, the authors introduce some of the datasets they will be looking at, talk about what's coming up in the rest of the book, and discuss the history of machine learning.

There's no lab in chapter 1 - the book's introduction to R is in chapter 2. But since R is a great tool for exploring datasets, I want kick off this series by looking at the datasets. This gives us a gentle introduction to R by playing with data.

Along the way, I'll introduce two of my favorite packages for playing with data: dplyr and ggplot2.

If you want follow along, you can set up R and Rstudio yourself - I have an earlier post on doing this with Ubuntu - or you can start this snapshot on Terminal.com, which is already set up with the ggplot2, dplyr, and ISLR packages.

Dplyr and ggplot2

To add dplyr and ggplot2 to your R session's namespace, first make sure you've run install.packages("dplyr") and install.packages("ggplot2"). Once you have, you can run these commands:

library(dplyr)  
library(ggplot2)  

Both of these packages are written by Hadley Wickham, a developer who's written many popular data-analysis tools for R. The dplyr package makes slicing and dicing data - for example, using command similar to SQL when appropriate - fast and easy. It provides nicer printing of R data.frames, which you can access by replacing a data.frame df with tbl_df(df).

It also uses the maggritr package's %>% operator, which is similar to F#'s pipe operator: the expression a %>% f(b, c) is equivalent to f(a, b, c). This can lead to much more readable code: for example compare h(g(f(a), b), c) to f(a) %>% g(b) %>% h(c).

The ggplot2 package stands for "Grammar of Graphics." R has some wonderful built-in plots for looking at raw data and mathematical objects, but ggplot has a very powerful domain-specific plotting language that makes plots of data.frames very easy, and the plots it produces are very pretty (in my opinion, of course). It's so popular, in fact, that the theme and the DSL have both been ported to python!

Taking a look at the Wage data

The Wage dataset covers wages and various socioeconomic variables for working men.

full_wages = tbl_df(ISLR::Wage)  
dim(full_wages)  
#> [1] 3000   12
colnames(full_wages)  
#>  [1] "year"       "age"        "sex"        "maritl"     "race"      
#>  [6] "education"  "region"     "jobclass"   "health"     "health_ins"
#> [11] "logwage"    "wage"

There are a lot of variables here. Our goal is to produce the plots shown in the book, so let's focus on the variables age, year, and education. We can extract a data.frame with just these using the select function from dplyr:

wages = full_wages %>% select(wage, education, year, age)  

In ISLR, this dataset is mostly used for regression models where we take a bunch of variables and try to predict a numeric outcome - in this case, wage.

Let's use ggplot2 to get some visualizations of how wage relates to the other variables.

wage and education

We'll start with boxplots showing how education interacts with wage. To make a plot using ggplot2, first we create a bare plot by passing the ggplot() function a dataset, and then we build the plot up. In this case, we want to map the education and wage variables to the aesthetics x and y, and add boxplots to the graph:

ggplot(wages) + aes(x = education, y = wage) + geom_boxplot()  

An Introduction to Statistical Learning in R: Chapter 1 This is nice, but ISLR has colors in its boxplots. Who doesn't love colors? In ggplot2, the fill color is just another aesthetic (color is also an aesthetic, but it only colors the borders of boxplots):

ggplot(wages) + aes(x = education, y = wage, fill=education) + geom_boxplot()  

An Introduction to Statistical Learning in R: Chapter 1 Notice how nice the colors look. ggplot2's colors are designed based on studies of how we perceive color, in order to give us a balanced visualization of the data.

Before moving on, let's look at one more plot. Boxplots lose a lot of information about the shape of a distribution, and often it helps to see this information. We can get a nice smooth estimate of the density using a violin plot, which in ggplot2 means replacing geom_boxplot() with geom_violin:

ggplot(wages) + aes(x = education, y = wage, fill=education) + geom_violin()  

An Introduction to Statistical Learning in R: Chapter 1 Notice how we now can see that the advanced degree holders actually split into two groups, one that makes only a little more than typical college grads and another that makes much more. With the boxplot, all we saw were some outliers, and not a clear picture of how they relate to the overall distribution.

wage and year

Next let's look at how year and wage interact. The authors show the data as a scatterplot, with a straight-line fit going through it. In ggplot2, we accomplish this by adding geom_point() to the plot, and stat_smooth():

ggplot(wages) + aes(x = year, y = wage) + geom_point() + stat_smooth(method=lm)  

An Introduction to Statistical Learning in R: Chapter 1 This plot is actually pretty hard to read because of how the points are stacked on top of one another. We can improve the visualization by adding a jitter to the the values, which spreads them out a bit. To do this, we just replace geom_point with geom_jitter:

ggplot(wages) + aes(x = year, y = wage) + geom_jitter() + stat_smooth(method=lm)  

An Introduction to Statistical Learning in R: Chapter 1

wage and age

The one other plot in ISLR is a scatterplot of wages versus age:

ggplot(wages) + aes(age, wage) + geom_point() + stat_smooth()  

An Introduction to Statistical Learning in R: Chapter 1 This plot looks pretty good, but the points are still stacked up on each other a bit. We could fix this by adding a jitter like we did when plotting year versus wage. Another way of getting a nice visualization is to add alpha, or transparency, to the points and make them a bit bigger. This produces a cloud-like effect where we can see the parts of the plot with more and less data:

ggplot(wages) + aes(age, wage) + geom_point(alpha=0.15, size=5) + stat_smooth()  

An Introduction to Statistical Learning in R: Chapter 1

Looking at more than two variables at once

Often we want to look at more than just two variables in one plot, in which case we need to make clever use of colors or multiple side-by-side plots. ggplot2 has a concept called a facet for making side-by-side plots. See what happens if we take the age and wage scatterplot and add a facet_wrap on education:

ggplot(wages) + aes(age, wage) + geom_point(size=1, aes(color=education)) + facet_wrap(~ education)  

An Introduction to Statistical Learning in R: Chapter 1


The Smarket dataset

The stock market data gives today's market movement, along with a measure of trading volume and several normalized lags in market movement. Let's take a peek:

smkt = tbl_df(ISLR::Smarket)  
smkt  
#> Source: local data frame [1,250 x 9]
#> 
#>    Year   Lag1   Lag2   Lag3   Lag4   Lag5 Volume  Today Direction
#> 1  2001  0.381 -0.192 -2.624 -1.055  5.010 1.1913  0.959        Up
#> 2  2001  0.959  0.381 -0.192 -2.624 -1.055 1.2965  1.032        Up
#> 3  2001  1.032  0.959  0.381 -0.192 -2.624 1.4112 -0.623      Down
#> 4  2001 -0.623  1.032  0.959  0.381 -0.192 1.2760  0.614        Up
#> 5  2001  0.614 -0.623  1.032  0.959  0.381 1.2057  0.213        Up
#> 6  2001  0.213  0.614 -0.623  1.032  0.959 1.3491  1.392        Up
#> 7  2001  1.392  0.213  0.614 -0.623  1.032 1.4450 -0.403      Down
#> 8  2001 -0.403  1.392  0.213  0.614 -0.623 1.4078  0.027        Up
#> 9  2001  0.027 -0.403  1.392  0.213  0.614 1.1640  1.303        Up
#> 10 2001  1.303  0.027 -0.403  1.392  0.213 1.2326  0.287        Up
#> ..  ...    ...    ...    ...    ...    ...    ...    ...       ..

In finance, usually we would want to predict the amount by which the stock will move in upcoming days, but ISLR uses this example dataset mainly for classification problems: can we predict based on previous days' returns whether the market will go up or down?

This data is not easy to visualize. This is pretty typical of stock market data, where the correlations are usually very low. Chapter 1 of ISLR has boxplots of the market direction against the preceding days' movements for several lags. Since they aren't very informative plots, we'll just look at the one-lag case.

And instead of boxplots, which we've seen already, let's make density plots (which are sort of like smooth histograms) of the distributions of Lag1, the preceding day's return, conditional on whether the market went up or down. In ggplot2, we do this by using geom_density, and mapping Direction to an aesthetic such as fill. Note that if we don't want one plot to hide the other, we need to use alpha here:

ggplot(smkt) + aes(Lag1, fill=Direction, alpha=0.5) + geom_density()  

An Introduction to Statistical Learning in R: Chapter 1

The NCI60 gene expression dataset

A first look

The NCI60 gene expression dataset has a matrix with 64 observations of 6,830 gene expression measurements on cancer patients. This matrix is found in the $data entry of the list, while the $labs entry is a length-64 character vector with the type of cancer.

nci60 = ISLR::NCI60  
class(nci60)  
#> [1] "list"
names(nci60)  
#> [1] "data" "labs"
class(nci60$data)  
#> [1] "matrix"
dim(nci60$data)  
#> [1]   64 6830
nci60$labs  
#>  [1] "CNS"         "CNS"         "CNS"         "RENAL"       "BREAST"     
#>  [6] "CNS"         "CNS"         "BREAST"      "NSCLC"       "NSCLC"      
#> [11] "RENAL"       "RENAL"       "RENAL"       "RENAL"       "RENAL"      
#> [16] "RENAL"       "RENAL"       "BREAST"      "NSCLC"       "RENAL"      
#> [21] "UNKNOWN"     "OVARIAN"     "MELANOMA"    "PROSTATE"    "OVARIAN"    
#> [26] "OVARIAN"     "OVARIAN"     "OVARIAN"     "OVARIAN"     "PROSTATE"   
#> [31] "NSCLC"       "NSCLC"       "NSCLC"       "LEUKEMIA"    "K562B-repro"
#> [36] "K562A-repro" "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"    "LEUKEMIA"   
#> [41] "LEUKEMIA"    "COLON"       "COLON"       "COLON"       "COLON"      
#> [46] "COLON"       "COLON"       "COLON"       "MCF7A-repro" "BREAST"     
#> [51] "MCF7D-repro" "BREAST"      "NSCLC"       "NSCLC"       "NSCLC"      
#> [56] "MELANOMA"    "BREAST"      "BREAST"      "MELANOMA"    "MELANOMA"   
#> [61] "MELANOMA"    "MELANOMA"    "MELANOMA"    "MELANOMA"

In practice, researchers might want to use a dataset like this, which has labels for the cancer type, in classification. But often these labels are missing in practice, and researchers need to try to invent their own classes. This problem is called clustering, and we'll look at it in later chapters of ISLR.

A dataset like this which has true labels available is great for playing around with clustering models, since the true labels give us a way to judge whether our clustering algorithm works in practice.

Aside from clustering, another type of algorithm which is useful in both machine learning and everyday data analysis is dimensionality reduction, where we take many variables and try to produce just a few that contain most of the information. One of the most popular of these dimension-reduction algorithms is called principal components analysis, or PCA.

The PCA algorithm (you can skip the math)

We'll see how PCA works later in ISLR. Here's a tiny introduction for the brave:

First, we normalize our data; otherwise the principal components tend to just tell us which columns are the most spread out. Next, we use the singular value decomposition of our data matrix to express it as \(X = U D V^T\), where \(V\) is 6,830 by 64, and \(U\) and \(D\) are 64 by 64. The principal components of the data are the columns of \(U\), and the "magnitude" of these principal components are given by \(V\).

We can get the SVD of the normalized matrix - hence the principal components - as follows:

# normalize the variables before running pca
colMus = colMeans(nci60$data)  
colSds = apply(nci60$data, 2, sd)  
normalized = (nci60$data - colMus) / colSds  
# get the svd of the matrix
duv = svd(normalized)  
Visualizing the results of PCA

Ideally, when we perform PCA we see a steep dropoff in the magnitude of the principal components, which makes it easy to judge how many of them are important. We'll discuss this more later when we get to the PCA part of the book. Unfortunately, in this case the dropoff is rather smooth, which makes it hard to know how many components might be "important":

qplot(1:length(duv$d), duv$d,  
      xlab = "principal component", ylab = "diagonal scale factor")

An Introduction to Statistical Learning in R: Chapter 1 Above, we used a new function from ggplot2 called qplot, which is convenient when you want to make a graph of x-versus-y for data that isn't part of a data.frame.

The principal components are "directions", in the high-dimensional space spanned by our data, which capture a lot of information about the data. Again, we'll se this in more detail later, but for now let's take a look at what our data looks like in the space spanned by the first two principal components.

qplot(duv$u[,1], duv$u[,2], size=3,  
      xlab = "1st component", ylab = "2nd component",
      color=nci60$labs)

An Introduction to Statistical Learning in R: Chapter 1 Notice that looking at these two components, we can already start to see groups of related data points.

Next week

Now that you've seen the power of R for plotting data, we'll look at more quantitative tools next week, as we go through Chapter 2 of ISLR.

In the meantime, feel free to leave feedback or questions in the comments.

]]>
<![CDATA[What are your thoughts?]]>http://blog.terminal.com/what-are-your-thoughts/335b52b2-61ae-402b-af91-87a2e5f87503Tue, 29 Dec 2015 22:25:17 GMT What are your thoughts?

Our mission is to make software that solves real computing problems and helps engineers work better.

To that end, we'd like to learn more about how you're currently using cloud services and what you might like to see in 2016, to help us improve our products.

What do you like about the cloud? What capabilities do you wish were available? What problems do you typically encounter?

We know your time is valuable, so we're giving away Amazon gift cards worth $20 each to five participants -- just be sure to enter your contact info on the last page of the survey to enter to win, as the survey is anonymous.

Access the survey here.



Thanks all -- we're excited to hear what you have to say.

Varun
CEO & Founder, Terminal.com

]]>
<![CDATA[Using Daemon to create Linux services in 5 minutes]]>

Daemon is a simple Linux service designed to turn a process into a daemon. It essentially creates child processes of itself, turning them into daemons.

Daemon has many useful features, such as configuring permissions, respawning dead processes, and managing children daemons as though they were independent services.

If you're just

]]>
http://blog.terminal.com/using-daemon-to-daemonize-your-programs/027edb0c-a2da-40be-ac6f-cef0868ac610Fri, 06 Nov 2015 19:26:06 GMT Using Daemon to create Linux services in 5 minutes

Daemon is a simple Linux service designed to turn a process into a daemon. It essentially creates child processes of itself, turning them into daemons.

Daemon has many useful features, such as configuring permissions, respawning dead processes, and managing children daemons as though they were independent services.

If you're just interested in the "5 minutes" version of this post, just jump over the Bonus Track section


Installation

Daemon can be installed in any RPM-based Linux distribution by using the RPM package.

In non-RMS distros you can compile it from its source code.

We've also created this DEB package for easy installation in DEB-based distributions like Debian or Ubuntu.


Running it manually

Once it's installed, you can run daemon either as a Linux service or as a normal daemon. Let's see an example.

First, I create this simple script to use it later as a daemon program:

#!/bin/bash
echo $(date)" Starting Script" >> /tmp/output  
while true  
        do
        echo $(date) >> /tmp/output
        sleep 5
done  

The script will just put the date in a file on a five second basis. I named the script 'test.sh'

Now, by using daemon, we launch the script using its full path:

[root@terminal40162 ~] daemon -r /root/test.sh

[root@terminal40162 ~] ps -ef | grep test
root     11421     1  0 14:11 ?        00:00:00 daemon -r /root/test.sh  
root     11422 11421  0 14:11 ?        00:00:00 /bin/bash /root/test.sh  

As you can see, daemon forked a child process running the test.sh script.


Auto respawn

In the previous example, we left the test.sh script in daemon, using the -r flag.

According the daemon man page, the -r flag is used to auto-respawn the 'daemonized' program if its process dies.

Using Daemon to create Linux services in 5 minutes

In this example, I'm using screen to split the screen and show the results of killing the process that was previously launched using daemon -r. As you can see, the process is being executed again by daemon when it dies.


Running 'daemonized' processes using startup scripts

Depending on which Unix / Linux flavor and distribution you're handling, there are different ways to set up a service. In this post I will cover two basic examples.

Initd script

In order to configure a daemon process in a system that uses the traditional SysVinit services startup method at boot, you need to create shell scripts that respond to start, stop, restart, and (when supported) reload commands.

Specifically, these must be saved to the /etc/init.d directory where they can be invoked directly or (most commonly) via some other trigger, like the presence of a symbolic link in any of the /etc/rc?.d/ directories.

Let's go back to our 'daemonized' script example, and make a simple SysVinit compatible script.

#!/bin/sh
#
# TEST          Start/Stop our TEST example script.
#
# chkconfig: 2345 90 60
# description: This is a simple service script that was made to demonstrate \
# how to write SysVinit scripts to 'daemonize' programs.
#
# The daemon's name (to ensure uniqueness and for stop, restart and status)
name="TEST"  
# The path of the client executable
command="/root/test.sh"  
# Any command line arguments for the client executable
command_args=""  
# The path of the daemon executable
daemon="/usr/local/bin/daemon"

[ -x "$daemon" ] || exit 0
[ -x "$command" ] || exit 0

# Note: The following daemon option arguments could be in /etc/daemon.conf
# instead. That would probably be better because if the command itself were
# there as well then we could just use the name here to start the daemon.
# Here's some code to do it here in case you prefer that.

# Any command line arguments for the daemon executable (when starting)
daemon_start_args="" # e.g. --inherit --env="ENV=VAR" --unsafe  
# The pidfile directory (need to force this so status works for normal users)
pidfiles="/var/run"  
# The user[:group] to run as (if not to be run as root)
user=""  
# The path to chroot to (otherwise /)
chroot=""  
# The path to chdir to (otherwise /)
chdir=""  
# The umask to adopt, if any
umask=""  
# The syslog facility or filename for the client's stdout (otherwise discarded)
stdout="daemon.info"  
# The syslog facility or filename for the client's stderr (otherwise discarded)
stderr="daemon.err"

case "$1" in  
    start)
        # This if statement isn't strictly necessary but it's user friendly
        if "$daemon" --running --name "$name" --pidfiles "$pidfiles"
        then
            echo "$name is already running."
        else
            echo -n "Starting $name..."
            "$daemon" --respawn $daemon_start_args \
                --name "$name" --pidfiles "$pidfiles" \
                ${user:+--user $user} ${chroot:+--chroot $chroot} \
                ${chdir:+--chdir $chdir} ${umask:+--umask $umask} \
                ${stdout:+--stdout $stdout} ${stderr:+--stderr $stderr} \
                -- \
                "$command" $command_args
            echo done.
        fi
        ;;

    stop)
        # This if statement isn't strictly necessary but it's user friendly
        if "$daemon" --running --name "$name" --pidfiles "$pidfiles"
        then
            echo -n "Stopping $name..."
            "$daemon" --stop --name "$name" --pidfiles "$pidfiles"
            echo done.
        else
            echo "$name is not running."
        fi
        ;;

    restart|reload)
        if "$daemon" --running --name "$name" --pidfiles "$pidfiles"
        then
            echo -n "Restarting $name..."
            "$daemon" --restart --name "$name" --pidfiles "$pidfiles"
            echo done.
        else
            echo "$name is not running."
            exit 1
        fi
        ;;

    status)
        "$daemon" --running --name "$name" --pidfiles "$pidfiles" --verbose
        ;;

    *)
        echo "usage: $0 <start|stop|restart|reload|status>" >&2
        exit 1
esac

exit 0  

Note: The original version of this file can be found here. I've removed several comments and GNU license notices to keep it as clean as possible.

Let's put that code in a file, in /etc/init.d. I've called my file test.

Remember to make your file executable. You can do that by using the chmod command, for instance:

chmod +x /etc/init.d/test  

We're ready to test it.

[root@terminal40162 ~] /etc/init.d/test start
Starting TEST...done.

[root@terminal40162 ~] /etc/init.d/test status
daemon:  TEST is running (pid 17039)

[root@terminal40162 ~] ps -ef | grep test | grep -v grep
root     17039     1  0 18:05 ?        00:00:00 /usr/local/bin/daemon --respawn --name TEST --pidfiles /var/run --stdout daemon.info --stderr daemon.e  
rr -- /root/test.sh  
root     17040 17039  0 18:05 ?        00:00:00 /bin/bash /root/test.sh

[root@terminal40162 ~] /etc/init.d/test stop
Stopping TEST...done.  

At this point, we just need to make sure to enable our new service to start at boot time.
Traditionally, you have to create Sim links to your script in the different /etc/rc?.d directories -- you can still do this of course, but this time we will use the chkconfig utility to manage our service easily.

Note that we've added this specific section to make our script compatible with chkconfig, specifying the default runlevels (2, 3, 4, and 5):

#!/bin/sh
#
# TEST          Start/Stop our TEST example script.
#
# chkconfig: 2345 90 60
# description: This is a simple service script that was made to demonstrate \
# how to write SysVinit scripts to 'daemonize' programs.
#

Now, we just need to add our test script to the chkconfig list by executing chkconfig --add test and see the results:

[root@terminal40162 ~] chkconfig --add test

[root@terminal40162 ~] chkconfig --list test
test            0:off   1:off   2:on    3:on    4:on    5:on    6:off

[root@terminal40162 ~] find /etc -name "*test"
/etc/rc.d/rc0.d/K60test
/etc/rc.d/rc1.d/K60test
/etc/rc.d/rc5.d/S90test
/etc/rc.d/rc3.d/S90test
/etc/rc.d/init.d/test
/etc/rc.d/rc2.d/S90test
/etc/rc.d/rc4.d/S90test
/etc/rc.d/rc6.d/K60test

As you can see, the test service has been added to the chkconfig list and all Sim links were created as needed.

Upstart script (config file)

If you prefer to manage your services with Upstart, you can also create an Upstart config file for this purpose.

Take a look at this example, based on the same test script that we've created before.

description "Daemonized Test service -  Upstart script"  
author "Enrique Conci"

start on runlevel [2345]  
stop on runlevel [!2345]  
respawn

env name="TEST"  
env command="/root/test.sh"  
env command_args=""  
env daemon="/usr/local/bin/daemon"  
env daemon_start_args=""  
env pidfiles="/var/run"  
env user=""  
env chroot=""  
env chdir=""  
env umask=""  
env stdout="daemon.info"  
env stderr="daemon.err"


pre-start script  
[ -x "$daemon" ] || exit 0
[ -x "$command" ] || exit 0
end script

exec "$daemon" --respawn $daemon_start_args \  
                --name "$name" --pidfiles "$pidfiles" \
                ${user:+--user $user} ${chroot:+--chroot $chroot} \
                ${chdir:+--chdir $chdir} ${umask:+--umask $umask} \
                ${stdout:+--stdout $stdout} ${stderr:+--stderr $stderr} \
                -- \
                "$command" $command_args

pre-stop script  
"$daemon" --stop --name "$name" --pidfiles "$pidfiles"
end script  

The next step is to put all that into a new .conf upstart file located in the /etc/init/ directory and check its syntax:

[root@terminal40250:/etc/init] init-checkconf /etc/init/test.conf
File /etc/init/test.conf: syntax ok  

It's important to use the .conf extension in every Upstart config file.

At this point, we're ready to test it.

[root@terminal40250:~] service test start
test start/running, process 9902

[root@terminal40250:~] ps -ef | grep 9902
root      9902     1  0 20:35 ?        00:00:00 /usr/local/bin/daemon --respawn --name TEST --pidfiles /var/run --stdout daemon.info --stderr daemon.err -- /root/test.sh

[root@terminal40250:~] service test stop
test stop/waiting  

It works!


Bonus track! - Automating the process

It could be a little tedious to follow all the steps to install Daemon, create the Upstart/Init scripts, and configure them.

In order to speed up the whole process, we've created a simple bash script that will detect your Terminal operating system and install, configure, and start a daemonized program in a single step.


Installing and Using daemonize.sh

Installing the daemonize.sh script is quite simple. You just need to get the script and make it executable.

wget https://raw.githubusercontent.com/terminalcloud/terminal-tools/master/daemonize.sh  
chmod +x daemonize.sh  



To use daemonize.sh just execute it like this:

./daemonize.sh <service_name> <command> <'arguments'>

...where service_name is an arbitrary name for your new service, command is the executable file that you want to turn to a service, and 'arguments' is a string passed as argument to your command/program. The 'arguments' parameter is not mandatory.

Example:

[root@terminal41464 ~] ./daemonize.sh sample_service /root/test.sh
Daemon already installed  
Checking if chkconfig is available  
chkconfig installed  
Installing SysV Init script  
Starting sample_service...done.

[root@terminal41464 ~] ps -ef | grep sample_service
root      1168     1  0 18:14 ?        00:00:00 /usr/local/bin/daemon --respawn --name sample_service --pidfiles /var/run --stdout daemon.info --stderr daemon.err -- /root/test.sh

[root@terminal41464 ~] chkconfig | grep sample
sample_service  0:off   1:off   2:on    3:on    4:on    5:on    6:off  

As you can see, the service has been installed and it's already running.


Conclusion and final notes

I hope you found this post useful.

For more background info, check out our blog posts on getting started with Linux services and getting started with Upstart.

Now go out there and make something amazing!

]]>
<![CDATA[Ten unsorted Vi / Vim tricks: Volume 2]]>http://blog.terminal.com/ten-unsorted-vi-vim-tricks-volume-2/59c997f7-3c97-4751-a20c-08f4cd35110cFri, 23 Oct 2015 18:04:17 GMT
Edit text like a pro with Vi.
Ten unsorted Vi / Vim tricks: Volume 2

This is my second post about Vi / Vim tricks. Check out my first post for even more tips.

Please note that this list is unsorted, and there are many other tricks to be learned. If there's a specific topic or question you'd like us to cover, please leave us a comment.


Tip 1) Open the file under the cursor

Believe me, this trick is great. You can open a file in Vim just by pressing g f, with the cursor over the file name.

Ten unsorted Vi / Vim tricks: Volume 2


Tip 2) Show, set, and reset Vim's variables

More than a trick, this is part of the standard Vim usage and configuration. Many people don't use the Vim configuration variables, so we're including these in our list as well.

  • :set will show the config variables that are currently different from the default value
  • :set all will show all config variables
  • :set <var>? will retrieve the value of just one variable (<var>)
  • :set <var>& will set the variable (<var>) to its default value
  • :set <var>=<value> will assign a value to a variable

Ten unsorted Vi / Vim tricks: Volume 2 Ten unsorted Vi / Vim tricks: Volume 2


Tip 3) Insert the output of an external command

You can use the :r <command> Vim command to insert the output of an external console command in the file that you're currently editing.

Ten unsorted Vi / Vim tricks: Volume 2


Tip 4) Enable syntax highlighting

In some cases the syntax highlighting feature is disabled by default. In order to enable it, you can use the :syntax on command. You can disable it again by executing :syntax off.

Ten unsorted Vi / Vim tricks: Volume 2


Tip 5) Jump to the matching bracket and select contents

You can use the % key to jump to a matching opening or closing parenthesis ( ), curly brace { }, or square bracket [ ]. This is especially useful when using Vim to write code.

Additionally, you can remap the % key (change the default action) to visually select the code in between. You can do that by executing :noremap % v%.

Ten unsorted Vi / Vim tricks: Volume 2


Tip 6) Highlight search matches

By executing :set hlsearch, you can highlight all search matches at the same time. This function can be temporarily disabled by using the :nohlsearch option or its abbreviated version, :nohl.

Ten unsorted Vi / Vim tricks: Volume 2


Tip 7) Record commands

With Vim, you can record several actions and then execute these actions again.

  • qa starts recording
  • q stops recording
  • @a plays & applies the recorded command
  • @@ repeats the process Ten unsorted Vi / Vim tricks: Volume 2

In this example, first I start recording by pressing q a, then I convert the first line to uppercase (with gUU) appending the word 'YEAH!!' to the end of line. After that, I write the second line and stop recording by pressing q. After that, I apply the recorded macro to the second line by pressing @ a and repeat it by pressing @ @ a couple times.


Tip 8) Find word under the cursor

This easy but powerful trick will let you find the next or previous occurrence of the word that's currently under the cursor.

  • Use * to search for the next occurrence
  • Use # to search for the prior occurrence

Ten unsorted Vi / Vim tricks: Volume 2


Tip 9) Command history

To get a list of all of the commands executed in Vim, use the :history command.
Ten unsorted Vi / Vim tricks: Volume 2


Tip 10) Easter eggs!

Try by yourself...

  • :help 42
  • :help holy-grail
  • :help!
  • :help UserGettingBored
  • help bar
  • :Ni!

That's it! I hope you've enjoyed my post. As always, please leave any questions or followup requests in the comments section.

Now go out there and write something amazing!

]]>
<![CDATA[Ten unsorted Vi / Vim tricks: Volume 1]]>http://blog.terminal.com/vi-tips-and-tricks/3b470e58-96d7-4245-9a10-b7c49260107aTue, 06 Oct 2015 17:56:04 GMT
Edit text like a pro with Vi.
Ten unsorted Vi / Vim tricks: Volume 1

In this post I will present ten easy but useful tricks using Vi / Vim commands. Vim (Vi Improved) is the modern version of Vi, the screen-oriented text editor originally created for the Unix operating system. Vim is present or installable in all Linux distributions like the ones in the Terminal.com snapshot store.

You might ask, why Vi?

Starting a turf war is far away from the objective of this post. Beyond personal preferences, learning Vi is useful for three main reasons:

  • Vi is always there. Since Vi is required by POSIX, you will find it in any *nix/Linux distribution.
  • Vi is extremely powerful. Editing text in a Unix based system is an everyday matter and with Vi you can speed up many editing tasks. Remember that in Unix, everything is a file and many of them are text files.
  • Vi is highly configurable. Additional plugins for auto-completion, text replacement, syntax highlighting... believe or not all those things are available in Vi.

Please note that this list is unsorted, and there are many more tricks to be learned. If there's a specific topic or question you'd like us to cover, please leave us a comment!


Tip 1) Getting some help

Vi has an extensive online help. You can access it by using the :h Vi command or the F1 key. If you're on a Mac, instead of F1, use fn + F1.
Ten unsorted Vi / Vim tricks: Volume 1

If you want to obtain specific help about a certain command, you can use the :h command Vi command. For instance: :h x
Ten unsorted Vi / Vim tricks: Volume 1


Tip 2) Search and replace

The search and replace function in Vi is one of the first tricks I learned, and one I use every day.

Searching for a character in a line

To search for a certain character within the current line, you can use the f command, followed by the character that we're looking for.

In this example, I'm pressing f 5 repeatedly in command mode.
Ten unsorted Vi / Vim tricks: Volume 1

Searching the entire file

You can also search for a word across an entire text file within Vi, by using the / command. In this example I will search for the word class, and then I will press n to go to the next occurrence.
Ten unsorted Vi / Vim tricks: Volume 1 Note that after the last occurrence, if you continue pressing the n key Vi will continue searching from the beginning of the file.

Searching for and replacing a word

Vi word replacement works more or less like in sed. The replacement command must be used starting with the colon character (:). Let's see an example:
Ten unsorted Vi / Vim tricks: Volume 1 Here I'm using the command :%s/app/application/gc to replace the word app with application. By adding the c flag at the end, I'm asked to confirm every replacement.


Tip 3) Lowercase to uppercase and vice versa

This one can be a little tricky, but once you understand it's like riding a bicycle.
Always in ex mode:

  • gU<n> then enter will convert to uppercase n+1 lines, from the cursor position. In this example, I press gU2 and enter: Ten unsorted Vi / Vim tricks: Volume 1

  • gU<n> then right arrow will convert to uppercase n characters, from the cursor position. In this example, I press gU3 and right: Ten unsorted Vi / Vim tricks: Volume 1

  • gU<n>w will convert to uppercase n words, from the cursor position. In this example, I press gU3w: Ten unsorted Vi / Vim tricks: Volume 1

  • gUU will convert an entire line to uppercase: Ten unsorted Vi / Vim tricks: Volume 1

Now, if you want to do the inverse operation (convert to lowercase) just replace the U with u in the command.

In that way, gu<n> will convert n+1 lines to lowercase, gu<n>w will convert n words, guu will convert the entire line to lowercase, etc.


Tip 4) Show line numbers

This simple trick will show line numbers in your Vim editor. Just execute the command :set nu or :set number and your lines will be numbered. To remove the line numbers, use the set nonumber command.

Ten unsorted Vi / Vim tricks: Volume 1


Tip 5) Execute an external command from within Vim

This function is also known as 'shell escape' as it has been used by many people to outsmart bad sudo security implementations.

To execute an external command from within Vim, just execute :!<command>. For instance:
Ten unsorted Vi / Vim tricks: Volume 1

To do a shell escape you can use a shell (like bash) as the command.


Tip 6) Insert an existing file into the current one

In case you need to insert the contents of a file into the one you're currently writing, use the :r <file> command. For instance:
Ten unsorted Vi / Vim tricks: Volume 1


Tip 7) Display changes performed since last save

This trick uses diff to display the difference between the file in disk and what you're currently editing. In order to do that, execute :w !diff % -
Ten unsorted Vi / Vim tricks: Volume 1


Tip 8) Indenting and un-indenting lines

You can indent a line by pressing >> (the > key, twice). You can also un-indent a line by pressing <<.

If you want to indent or un-indent a certain amount of lines, indicate the number of lines to be indented before the indentation command.

In this example, I execute 4>> and then 3<<. Let's see the results:
Ten unsorted Vi / Vim tricks: Volume 1


Tip 9) Undo and redo

When you're in edit mode, press esc to go back to the normal (command) mode and u to undo your last changes.

You can press u repeatedly to undo more actions.

If you want to 're-do' your changes which were undone, press Ctrl-R.
Ten unsorted Vi / Vim tricks: Volume 1


Tip 10) Open Vi / Vim in a specific line

You can open Vim with the cursor in a certain line by executing Vi in this way:

vim <filename> +<line number>

Check this example:
Ten unsorted Vi / Vim tricks: Volume 1


That's all folks! I hope you've enjoyed my post. As always, please leave any questions or followup requests in the comments section.

Now go out there and write something amazing!

]]>
<![CDATA[Getting started with WebTerminal]]>

Introducing the new WebTerminal -- install and use Terminal on any server, in any infrastructure. Code, debug, and test collaboratively with WebTerminal.


Why use WebTerminal?

I've said this before. I love the Terminal IDE.

I think having the Terminal console, a file browser, an editor, and even embedded video chat

]]>
http://blog.terminal.com/getting-started-with-webterminal/4ae2f88b-e740-4a1f-8bea-bab47a11aaf8Wed, 30 Sep 2015 18:21:00 GMT Getting started with WebTerminal

Introducing the new WebTerminal -- install and use Terminal on any server, in any infrastructure. Code, debug, and test collaboratively with WebTerminal.


Why use WebTerminal?

I've said this before. I love the Terminal IDE.

I think having the Terminal console, a file browser, an editor, and even embedded video chat all together is just great. But what if you want to develop in the Terminal environment on a specific server in your company's stack, or perhaps on your home Linux machine?

With WebTerminal, now you can.

WebTerminal can also help improve your server accessibility management by allowing coworkers to quickly and easily work together in-browser. Screensharing is a thing of the past; now you can view server commands as they're typed, make edits directly from your machine, and chat side-by-side.

During my years as a system administrator, I think that WebTerminal could have been a life-saving tool in situations like:

  • Collaborating with coworkers on a specific server
  • Code reviews
  • When you need someone else to teach you or help you with a specific issue
  • When you have to reconfigure or upgrade SSH
  • Coding interviews
  • Working remotely

Keep reading to learn how to install...


Installing WebTerminal

WebTerminal can be installed in any RPM or Deb-based Linux distribution. The installation is quite straightforward, and can be done in just a few simple steps.

On Ubuntu or Debian:
echo 'deb http://s3-us-west-1.amazonaws.com/cloudlabs.apt.repo/production /' | sudo tee -a /etc/apt/sources.list  
sudo apt-get update  
sudo apt-get install web-terminal  


On CentOS:
curl https://s3-us-west-1.amazonaws.com/cloudlabs.yum.repo/webterminal.repo | sudo tee -a /etc/yum.repos.d/webterminal.repo  
sudo yum install web-terminal  

WebTerminal basic configuration

The WebTerminal command options are listed below:

Usage: web-terminal COMMAND

List of Commands:  
start             Starts web-terminal server  
stop              Stops web-terminal server and clears state (open tabs, open files, etc)  
restart           Restarts web-terminal server without clearing state

set-port          Set port to listen on

enable-auth       Enable authentication  
disable-auth      Disable authentication  
configure-auth    Set username and password for authentication

enable-ssl        Enable SSL  
disable-ssl       Disable SSL  
configure-ssl     Set certificate details for SSL  

Before starting webterminal for the first time, I recommend making some basic configurations.

# web-terminal set-port 
What would you like the port to be? [8282]8080  
Port set to  8080.  
Restart your web-terminal with 'web-terminal restart' for changes to take effect  

This command will set the port to be 8080. The WebTerminal default port is 8282.

# web-terminal configure-auth
Provide a username: [termuser]terminal  
New password:  
Re-type new password:  
Adding password for user terminal  
Auth configured. Enable authentication with "web-terminal enable-auth"

# web-terminal enable-auth
Auth enabled!  
Restart your web-terminal with 'web-terminal restart' for changes to take effect  

We highly recommend setting up a user and password, and enabling authentication as shown above. Otherwise, make sure you have control over who can access the WebTerminal port.


Using WebTerminal

After you've configured WebTerminal, you can start it simply by executing web-terminal start.

You will see something like this:

# web-terminal start
warn:    --minUptime not set. Defaulting to: 1000ms  
warn:    --spinSleepTime not set. Your script will exit if it does not stay up for at least 1000ms  
info:    Forever processing file: compute/server.js  



Then go to http://your-server-ip:port and access with your user and password.
Getting started with WebTerminal



It's as simple as that. Enjoy it!

Have feedback for us, or using WebTerminal in cool ways? We'd love to hear about it in the comments.

]]>
<![CDATA[Working with services in Linux]]>

If you're just starting in the Linux world, you might be wondering about why there are so many different distributions: Ubuntu, CentOS, Debian... There seem to be a bunch of different names for similar things.

A Linux distribution is the stack of software bound to the Linux kernel; together they

]]>
http://blog.terminal.com/working-with-services/826886b7-71da-446d-8da3-7f5c99bd0e00Mon, 14 Sep 2015 23:15:37 GMT Working with services in Linux

If you're just starting in the Linux world, you might be wondering about why there are so many different distributions: Ubuntu, CentOS, Debian... There seem to be a bunch of different names for similar things.

A Linux distribution is the stack of software bound to the Linux kernel; together they make the operating system, which most of us call Linux or GNU / Linux.

It's great! You've got a lot of options, most of them very well-crafted, and the software is completely compatible, but there are a few things like package management tools or service management tools that differentiate one distro from another.

In this post, we will talk about Linux services and how to manage them in two of the most popular distributions.


What's a service?

A Linux service is a process or a set of processes running in the background. These kinds of processes are usually in charge of executing essential system tasks or running certain kinds of server applications like databases, schedule applications, or even sound system applications.


Managing services with SystemV

Most Linux distributions have adopted SystemV (SysV)-like init scripts to manage their services.

At any moment, a running SysV-like system is in one of the predetermined states, called runlevels. At least one runlevel is the normal operating state of the system; typically, other runlevels represent single-user mode, system shutdown, system restart, and various other states.

Switching from one runlevel to another causes a per-runlevel set of scripts to be run, which typically mount filesystems, start or stop services, etc. In the classic SysV way, if you want to disable or enable the automatic start of a service when changing to a certain level, you have to create or delete symlinks on the respective rc<level>.d directory.

For instance, take a look at a typical /etc/rc3.d:

root@terminal35765:/etc/rc3.d# ls -ltr  
total 4  
-rw-r--r-- 1 root root 677 Mar 12  2014 README
lrwxrwxrwx 1 root root  18 Jul 23  2014 S99rc.local -> ../init.d/rc.local  
lrwxrwxrwx 1 root root  17 Jul 23  2014 S91apache2 -> ../init.d/apache2  
lrwxrwxrwx 1 root root  24 Jul 23  2014 S20screen-cleanup -> ../init.d/screen-cleanup  
lrwxrwxrwx 1 root root  19 Jul 23  2014 S20saslauthd -> ../init.d/saslauthd  
lrwxrwxrwx 1 root root  17 Jul 23  2014 S20postfix -> ../init.d/postfix  
lrwxrwxrwx 1 root root  24 Jul 23  2014 S20modules_dep.sh -> ../init.d/modules_dep.sh  

SysV symlinks start with the letters S, or K.

  • S means "start" -- the script linked will run with the "start" parameter when that runlevel is entered
  • K means 'kill' -- the script linked will run with the "stop" parameter when that runlevel is reached

As you can see, there are symlinks to the actual startup scripts that you can call directly in case you need it to use different parameters, like start, stop, restart, etc.

But, this 'strict' method is being obsoleted by newer, dependency-based startup systems in Linux distros like Ubuntu.


Talking about Ubuntu

As of this post, the most current LTS (long term support) version of Ubuntu available on Terminal is Ubuntu 14.04. This version of Ubuntu comes with Upstart installed as the main services management tool. Upstart is also compatible with the classic SysV init system.

Some of your running services' init scripts might not be converted to Upstart, so you might have to call those scripts directly.

Let's see how to manage services in an Ubuntu 14.04 machine.

Temporarily enable and disable services

To stop and start services temporarily you can use the service command. Keep in mind this will start or stop your services, but they will be back to their default state on the next boot.

Let's take a look at some classic examples:

service apache2 stop  

This will stop the apache2 service immediately, but if it's configured to start automatically, it will be back online after the next system reboot.

service apache2 start  

This will START the Apache service, assuming it was stopped before.

service apache2 status  

This will tell you the STATUS of the service (running or not running).

service apache2 restart  

This will stop the service gracefully and start it again.

Enabling / disabling a service

To toggle a service from starting or stopping permanently when using Upstart, you need to create a configuration file and set the stanza to manual.

echo manual > /etc/init/SERVICE.override  

This will prevent Upstart from automatically loading the service on next boot. Any service with the .override ending will take precedence over the original service file. You will only be able to start the service manually afterwards. If you do not want this, you simply have to delete the .override.

For example:

echo manual > /etc/init/apache2.override  

...will configure the apache2 service into manual mode.

If you want to reconfigure Apache to start at the system boot again, you simply execute:

rm /etc/init/mysql.override  

The service will now start automatically again on the next boot.


Other tools...

There are several other tools to help you manage services, such as sysv-rc-conf.

sysv-rc-conf gives an easy-to-use interface for managing /etc/rc<runlevel>.d/ symlinks. The interface comes in two different flavors, one that simply allows turning services on or off and another that allows for more finely-tuned management of the symlinks.

Installing sysv-rc-conf
apt-get install sysv-rc-conf  
sysv-rc-conf basic usage

Running sysv-rc-conf without any argument will let you work on its 'GUI'.
Working with services in Linux

To start a service, press the + or = key. To stop a service, press the - key. This will call /etc/init.d/<service> start or /etc/init.d/<service> stop.


What about CentOS?

CentOS is a well-known free Linux distribution based on the commercial RedHat Linux. It manages its services in the classic SysV way, but it provides some extra tools to make this task more convenient.

Temporarily enable and disable services (start / stop)

Let's take a look at the different tools to work with services in CentOS.

Listing all services and their current status:

# service --status-all 
crond (pid  547) is running...  
htcacheclean is stopped  
httpd is stopped  
named is stopped  
nmbd is stopped  
nscd is stopped  
portreserve is stopped  
quota_nld is stopped  
rdisc is stopped  
rpcbind is stopped  
rsyslogd (pid  2147) is running...  
sandbox is stopped  
saslauthd is stopped  
sendmail is stopped  
sm-client is stopped  
smbd is stopped  
snmpd is stopped  
snmptrapd is stopped  
openssh-daemon (pid  536) is running...  
winbindd is stopped  
xinetd is stopped  

You can use the service command to check the status of a single service. For example:

# service sshd status
openssh-daemon (pid  536) is running...  
To stop service (cups in this example) execute command:  

You can also use the service command to perform different actions like stop and start, or even custom commands depending on the service you're managing.

# service httpd start
Starting httpd:                                            [  OK  ]

# service httpd restart
Stopping httpd:                                            [  OK  ]  
Starting httpd:                                            [  OK  ]

# service httpd configtest
Syntax OK

# service httpd graceful
Starting httpd:                                            [  OK  ]                                         
Enabling / disabling a service

The preferred tool to configure services to start at a certain runlevel in CentOS is chkconfig.

To list all services and their status at boot on each runlevel you can execute the command chkconfig with the --list argument:

# chkconfig --list
cloudlabside    0:off   1:off   2:on    3:on    4:on    5:on    6:off  
crond           0:off   1:off   2:on    3:on    4:on    5:on    6:off  
htcacheclean    0:off   1:off   2:off   3:off   4:off   5:off   6:off  
httpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off  
ip6tables       0:off   1:off   2:off   3:off   4:off   5:off   6:off  
iptables        0:off   1:off   2:off   3:off   4:off   5:off   6:off  
modules_dep     0:off   1:off   2:on    3:on    4:on    5:on    6:off  
named           0:off   1:off   2:off   3:off   4:off   5:off   6:off  
netconsole      0:off   1:off   2:off   3:off   4:off   5:off   6:off  
netfs           0:off   1:off   2:off   3:off   4:off   5:off   6:off  
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off  
nmb             0:off   1:off   2:off   3:off   4:off   5:off   6:off  
nscd            0:off   1:off   2:off   3:off   4:off   5:off   6:off  
portreserve     0:off   1:off   2:on    3:off   4:on    5:on    6:off  
quota_nld       0:off   1:off   2:off   3:off   4:off   5:off   6:off  
rdisc           0:off   1:off   2:off   3:off   4:off   5:off   6:off  
restorecond     0:off   1:off   2:off   3:off   4:off   5:off   6:off  
rpcbind         0:off   1:off   2:off   3:off   4:off   5:off   6:off  
rsyslog         0:off   1:off   2:off   3:off   4:off   5:off   6:off  
saslauthd       0:off   1:off   2:off   3:off   4:off   5:off   6:off  
sendmail        0:off   1:off   2:off   3:off   4:off   5:off   6:off  
smb             0:off   1:off   2:off   3:off   4:off   5:off   6:off  
snmpd           0:off   1:off   2:off   3:off   4:off   5:off   6:off  
snmptrapd       0:off   1:off   2:off   3:off   4:off   5:off   6:off  
sshd            0:off   1:off   2:on    3:on    4:on    5:on    6:off  
udev-post       0:off   1:off   2:off   3:off   4:off   5:off   6:off  
winbind         0:off   1:off   2:off   3:off   4:off   5:off   6:off  
xinetd          0:off   1:off   2:off   3:off   4:off   5:off   6:off  

To disable or enable a service at boot (all runlevels), execute command chkconfig with the name of the service as the parameter and the off / on flags. For example:

# chkconfig sshd off
# chkconfig sshd on

You can also do the same but only for certain runlevels:

# chkconfig --level 3 httpd off
# chkconfig --level 5 httpd on

Bonus pack: SystemD

Starting with Ubuntu 15.04, Upstart will be deprecated in favor of SystemD. The same is happening with Debian and CentOS.

Temporarily enable and disable services (start / stop / restart)

SystemD provides the systemctl command to manage services. This example shows how to start, stop, and restart the httpd service:

# systemctl start httpd.service
# systemctl stop httpd.service
# systemctl restart httpd.service


To check the current status of a service, you can use the status parameter. For example:

# systemctl status httpd.service
httpd.service - The Apache HTTP Server  
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled)
   Active: active (running) since Sat 2015-08-22 18:35:30 PDT; 4 days ago
 Main PID: 1092 (httpd)
   Status: "Total requests: 0; Current requests/sec: 0; Current traffic:   0 B/sec"
   CGroup: /system.slice/httpd.service
           ├─ 1092 /usr/sbin/httpd -DFOREGROUND
           ├─19608 /usr/sbin/httpd -DFOREGROUND
           ├─19611 /usr/sbin/httpd -DFOREGROUND
           ├─21807 /usr/sbin/httpd -DFOREGROUND
           ├─22665 /usr/sbin/httpd -DFOREGROUND
           ├─22736 /usr/sbin/httpd -DFOREGROUND
           ├─22792 /usr/sbin/httpd -DFOREGROUND
           ├─22797 /usr/sbin/httpd -DFOREGROUND
           ├─22906 /usr/sbin/httpd -DFOREGROUND
           ├─22907 /usr/sbin/httpd -DFOREGROUND
           ├─22908 /usr/sbin/httpd -DFOREGROUND
           └─22988 /usr/sbin/httpd -DFOREGROUND

Aug 22 18:35:30 cloud.hdmz.com systemd[1]: Started The Apache HTTP Server.  
Aug 23 08:59:48 cloud.hdmz.com systemd[1]: Reloading The Apache HTTP Server.  
Aug 23 08:59:48 cloud.hdmz.com systemd[1]: Reloaded The Apache HTTP Server.  


Enabling / disabling a service

The same systemctl command is used to configure a service to start (or not) on the next boot.

# systemctl enable crond.service
ln -s '/usr/lib/systemd/system/crond.service' '/etc/systemd/system/multi-user.target.wants/crond.service'  
# systemctl disable crond.service
rm '/etc/systemd/system/multi-user.target.wants/crond.service'  



Additionally, you can check if a service is configured to start on the next boot or if it's currently running by using the is-enabled and is-active parameters with systemctl respectively.

# systemctl is-enabled  httpd.service
enabled  
# systemctl is-active  httpd.service
active  


Finally, if you want to show all the information about the service, you can use the show parameter:

# systemctl show httpd.service
Id=httpd.service  
Names=httpd.service  
Requires=basic.target  
Wants=system.slice  
WantedBy=multi-user.target  
Conflicts=shutdown.target  
Before=shutdown.target multi-user.target  
After=network.target remote-fs.target nss-lookup.target systemd-journald.socket basic.target system.slice  
Description=The Apache HTTP Server  
LoadState=loaded  
ActiveState=active  
SubState=running  
FragmentPath=/usr/lib/systemd/system/httpd.service  
UnitFileState=enabled  
InactiveExitTimestamp=Sat 2015-08-22 18:35:29 PDT  
InactiveExitTimestampMonotonic=6475329  
ActiveEnterTimestamp=Sat 2015-08-22 18:35:30 PDT  
ActiveEnterTimestampMonotonic=7138275  
ActiveExitTimestampMonotonic=0  
InactiveEnterTimestampMonotonic=0  
CanStart=yes  
CanStop=yes  
CanReload=yes

[output truncated]

Conclusion

Understanding how services work in a Linux distribution is essential to having complete control over your system resources.

I hope you've enjoyed this article and find it useful as well. Now go out there and make something amazing!

]]>
<![CDATA[MongoDB replication and backup methods]]>http://blog.terminal.com/mongodb-replication-and-backup/0a6c7587-13e2-47f0-978a-04729c7da8d5Mon, 14 Sep 2015 22:08:29 GMT

MongoDB replication and backup methods

This post explains how to configure a replicated MongoDB database environment and create full database backups of a MongoDB instance to create redundancy, improve data availability, and provide failure recovery.


MongoDB replication

How does it work?

MongoDB handles replication through replication sets. Conceptually, replication sets are similar to the master-slave configuration explained in our MySQL post.

Working in that way, a single replication member (called "primary member") is used as the master point of replication for the rest of the members.

Additionally, replication sets can handle an automatic failover mechanism in case the primary member goes down for any reason.

Replication set members

A replication set member can assume different roles, described below:

Primary member:
The primary member is the only one that accepts write operations, and is the default source of replication transactions with the rest of the replication set members.

Secondary members:
The secondary members store the data, staying in sync by reproducing the changes made on the primary member. If the primary member goes down, a secondary member can take the lead as the data is transferred to it asynchronously. A replication set can include several secondary members.

Additionally, a secondary member can be customized to act as a:

  • Priority 0 member: It will never become primary but will continue replicating data
  • Hidden member: Priority 0 member that's also not visible to clients
  • Delayed replication member: Secondary member that will be time delayed in comparison to the primary member, used as a safeguard against accidental deletions and for recovery purposes

Arbiter member:
An arbiter member is an optional component. Its main role is to act as a tie-breaker in failover situations.


Setting up a replicated MongoDB environment running on Terminals

From this point on I will present the steps needed to configure a MongoDB master member and two secondary members.

To avoid IP confusions in configurations we will configure the DNS or host files in all servers, so we can use server names instead of IPs.


1. Starting, updating and linking our Terminals

To save some time, we will start three new Terminals from the MongoDB snapshot in our app store.

First we need to make sure that our machines are up to date. In order to do that, we will execute the following commands in all our Terminals:

apt-get update  
apt-get -y upgrade  

In my case, the Terminals that I've created are called:

  • terminal37474 (I will use this one as the primary member)
  • terminal37475 (I will use this as a secondary member)
  • terminal37477 (I will use this as another secondary member)

Then, we'll use the tlinks tool to link our Terminals, making sure they can connect to each other on the MongoDB port.

$ ./tlinks.py link -s terminal37475 -p 22,27017  terminal37474
True  
$ ./tlinks.py link -s terminal37477 -p 22,27017  terminal37474
True  
$ ./tlinks.py link -s terminal37474 -p 22,27017 terminal37475
True  
$ ./tlinks.py link -s terminal37477 -p 22,27017 terminal37475
True  
$ ./tlinks.py link -s terminal37474 -p 22,27017 terminal37477
True  
./tlinks.py link -s terminal37475 -p 22,27017 terminal37477
True  

Finally, as a non-mandatory but useful step, I will add my nodes' IP addresses to the host file of the primary member node, by editing the /etc/hosts file. Note that I'm naming the primary member mongo0 and the secondary members mongo1 & mongo2.

For me, it looks like this:

240.58.47.161 mongo0 terminal37474  
240.58.47.202 mongo1 terminal37475  
240.58.47.204 mongo2 terminal37477  

2. Configuring replication

The next steps will enable replication among your MongoDB nodes.

1. Stop MongoDB in all servers
service mongod stop  


2. Edit the configuration file in all Terminals and apply the changes

The MongoDB file is usually located at /etc/mongod.conf. Adjust the parameters as shown and change the replSet value for something you can remember, like ReplicaSet0.

port=27017  
bind_ip=0.0.0.0  
replSet=ReplicaSet0  
fork = true  

Then, start each replication member by executing:

export LC_ALL=C  
mongod --config /etc/mongod.conf  


3. Start the replication set and add the members

Go to the Terminal that will work as the primary member (mongo0 in our example), access the MongoDB command, and execute the rs.initiate() function:

root@terminal37474:~# mongo  
MongoDB shell version: 2.6.11  
connecting to: test  
Server has startup warnings:  
2015-09-11T18:06:12.454-0400 [initandlisten]  
> rs.initiate()
{
    "info2" : "no configuration explicitly specified -- making one",
    "me" : "terminal37474:27017",
    "info" : "Config now saved locally.  Should come online in about a minute.",
    "ok" : 1
}

This command will initiate the replication set, setting the current node as the primary member.

You can check it by executing rs.conf():

> rs.conf()
{
    "_id" : "ReplicaSet0",
    "version" : 1,
    "members" : [
        {
            "_id" : 0,
            "host" : "terminal37474:27017"
        }
    ]
}

You will notice that now the MongoDB command prompt looks different on that node:

ReplicaSet0:PRIMARY>  

Now, you will be able to add the other nodes by executing something like this:

ReplicaSet0:PRIMARY> rs.add("mongo1")  
{ "ok" : 1 }
ReplicaSet0:PRIMARY> rs.add("mongo2")  
{ "ok" : 1 }

Do this for each of your remaining replication members, in case you want to add more secondary members.

Your replication set should now be up and running. You can always check it by using the rs.conf() function.

ReplicaSet0:PRIMARY> rs.conf()  
{
    "_id" : "ReplicaSet0",
    "version" : 3,
    "members" : [
        {
            "_id" : 0,
            "host" : "terminal37474:27017"
        },
        {
            "_id" : 1,
            "host" : "mongo1:27017"
        },
        {
            "_id" : 2,
            "host" : "mongo2:27017"
        }
    ]
}

MongoDB backup methods: mongodump and mongorestore



mongodump and mongorestore are included in the MongoDB package and are the preferred tools to execute database dumps and restores.


1. Dumping data

By default, mongodump will dump all documents in a Mongo instance (all collections).

Making a local dump
mongodump --out dumps  

The --out option allows you to specify the directory where mongodump will save the dump files. mongodump creates a separate backup directories for each of the backed up databases. For example, if your databases are called kitchen, bedroom, livingroom, and bathroom, your directory structure will look like:

dumps/  
    |- kitchen
    |- bedroom
    |- livingroom
    |- bathroom


Making dumps through the network

Let's consider this example:

mongodump --host mongo0 --port 3017 --username user --password pass --out /backup  

mongodump will write BSON files that hold a copy of data accessible via the mongod service, listening to port 3017 of the mongo0 host.

With any mongodump command you may, as above, specify the credentials to require database authentication.


Limiting the dump to a collection and DB

Check out this example:

mongodump --collection home_products --db stock  

This command creates a dump of the collection named home_products from the database stock in a dump/ subdirectory of the current working directory.

Backing up a MongoDB instance without a running mongod service

Due to certain reasons, you might want to make a database dump after the mongod service is done. In order to do that, you need to specify the directory where the MongoDB instance data was saved. The location of such directory is configured in the /etc/mongod.conf file. You can check it by executing:

grep dbpath /etc/mongod.conf  
dbpath=/var/lib/mongodb  

For a MongoDB instance that contains several databases, the following mongodump operation backs up the databases using the --dbpath option, which specifies the location of the database files on the host:

mongodump --dbpath /data -o dataout  


Point-in-time dumps using oplogs

With mongodump, we can use the --oplog option. When we do this, mongodump will collect the oplog entries to build a point-in-time snapshot of a database within a replica set.

In that way, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure until the backup procedure completes.

This procedure will allow you to restore a backup that reflects a specific moment in time.


2. Restoring data

Backups are only part of the process. Restoring data is also critical. The MongoDB package offers a utility called mongorestore for just this purpose.

The mongorestore utility restores a binary backup (dump) created by mongodump. By default, mongorestore looks for backups in the dump/ directory.

Using mongorestore with the MongoDB service running or in a remote server

This is the typical case for restore. Let's take a look at the example below.

mongorestore --host mongo0 --port 3017 --db my_database --username myusername --password mypassword /backup/mydatabase_dump  

In the above example, mongorestore will perform a merge if it sees that the database already exists; this might give you unexpected results. If you don't want to merge, just add the --drop option to your command.


Restoring without mongod running

Sometimes you will have to restore a database with the MongoDB service down. To restore a database dump in that condition, you will have to do it locally, specifying the database path as in the example below.

mongorestore --dbpath /var/lib/mongo --db my_database /backup/dump/my_database  


Restoring point-in-time oplog backups with mongorestore

If you've created a database dump using the --oplog option, use mongorestore with the --oplogReplay option, as in the following example:

mongorestore --dbpath /var/lib/mongo --db my_database --oplogReplay /backup/dumps/my_database  

Closing notes

I hope you've enjoyed this post. I've written it to complete the Replication and backup series.

If you have any questions or suggestions about this post or series, please leave a comment.

Thanks for reading!

]]>
<![CDATA[PostgreSQL replication and backup methods]]>http://blog.terminal.com/postgresql-replication-and-backup-methods/7d27fe90-1b11-484f-b3ae-fc4120f33185Mon, 24 Aug 2015 22:07:35 GMT


Replication provides redundancy and increases data availability, allowing you to recover from hardware failure and service interruptions. If you have multiple copies of your data, you can dedicate one to disaster recovery, reporting, or backup.


Setting up a typical PostgreSQL replicated environment on Terminals

We will start with two new Terminals, created from a base Ubuntu snapshot.

After installing PostgreSQL on each Terminal, we can configure a typical master-slave replicated environment.


1. Install PostgreSQL

PostgreSQL installation is quite straightforward in almost any Linux distribution. In our case, we will use apt to install it.

apt-get update  
apt-get -y install postgresql  

Install PostgreSQL in both Terminals the same way.

As a little post-installation step, we will assign a password for the postgres user. We will use it later to configure the instances to be accessed remotely.

passwd postgres  

2. Link Terminals to each other

This step is needed when you're working in Terminals.

To configure PostgreSQL replication, the servers must be able to communicate with each other. To link the Terminals we will use the tlink command line tool. tlink is located in our terminal-tool GitHub repo.

I am only including the commands used to install tlink here; for more information on how to install tlink, check out this blog post.

apt-get -y install python-pip wget  
pip install terminalcloud  
wget https://raw.githubusercontent.com/terminalcloud/terminal-tools/master/tlinks.py  
chmod +x tlinks.py

./tlinks.py -u <myusertoken> -a <myaccestoken> show <anyterminal>

The last command shown is used to configure tlinks with your API tokens for the first time.

Now, we proceed to link our Terminals, allowing connections between ports 22 and 5432.

./tlinks.py link <master_terminal_subdomain> -s <slave_terminal_subdomain> -p 22,5432
./tlinks.py link <slave_terminal_subdomain> -s <master_terminal_subdomain> -p 22,5432

This will allow PostgreSQL and ssh connections in both directions between your Terminals.

You can check if the links were created correctly by using the show action.

./tlinks.py show <master_terminal_subdomain>
./tlinks.py show <slave_terminal_subdomain>



Check out my example:

enriques-mbp:~ enrique$ ./tlinks.py link qmaxquique3196 -s qmaxquique3197 -p 22,5432  
True  
enriques-mbp:~ enrique$ ./tlinks.py link qmaxquique3197 -s qmaxquique3196 -p 22,5432  
True  
enriques-mbp:~ enrique$ ./tlinks.py show qmaxquique3196  
Source        Port  
---------------------
qmaxquique3197    5432  
qmaxquique3197    22  
enriques-mbp:~ enrique$ ./tlinks.py show qmaxquique3197  
Source        Port  
---------------------
qmaxquique3196    5432  
qmaxquique3196    22  

3. Pre-configuration activities

Before configuring PostgreSQL replication itself, it's recommended to configure cross-keys authentication for the postgres user.

First, login to the first server and su to the postgres user. (From now on, we will call this Terminal master).

Create a SSH key and install it on the other server (that we will call slave).

For example:

root@master:~# su - postgres  
postgres@master:~$ ssh-keygen  
Generating public/private rsa key pair.  
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):  
Created directory '/var/lib/postgresql/.ssh'.  
Enter passphrase (empty for no passphrase):  
Enter same passphrase again:  
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.  
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.  
The key fingerprint is:  
4b:60:bb:a6:b6:95:9f:7c:c4:43:9b:80:79:9a:aa:6d postgres@master  
The key's randomart image is:  
+--[ RSA 2048]----+
|                 |
|                 |
|      oo         |
|     .ooo .      |
|      .+S+ o     |
|      o+ .*      |
|     .= .. .     |
|   .E+ o ..      |
|  .++.  +.       |
+-----------------+
postgres@master:~$ ssh-copy-id slave  
The authenticity of host 'slave (240.38.197.12)' can't be established.  
ECDSA key fingerprint is 8c:ad:98:0f:2f:68:6a:32:49:01:59:96:e0:65:92:a8.  
Are you sure you want to continue connecting (yes/no)? yes  
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
postgres@slave's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'slave'"  
and check to make sure that only the key(s) you wanted were added.  

Now let's login directly to the slave server to do the same.

postgres@master:~$ ssh slave  
Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 2.6.32-042stab104.1 x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Mon Aug  3 10:30:33 2015 from 240.38.197.11  
postgres@slave:~$ ssh-keygen  
Generating public/private rsa key pair.  
Enter file in which to save the key (/var/lib/postgresql/.ssh/id_rsa):  
Enter passphrase (empty for no passphrase):  
Enter same passphrase again:  
Your identification has been saved in /var/lib/postgresql/.ssh/id_rsa.  
Your public key has been saved in /var/lib/postgresql/.ssh/id_rsa.pub.  
The key fingerprint is:  
aa:e3:5f:36:8e:7f:52:b2:20:7b:99:00:d2:0e:dd:f3 postgres@slave  
The key's randomart image is:  
+--[ RSA 2048]----+
|                 |
|                 |
| o .             |
|o + o            |
| + . o  S        |
|  . o E.. .      |
|     +.+++       |
|    o.+=o..      |
|   .o+o.oo       |
+-----------------+
postgres@slave:~$ ssh-copy-id master  
The authenticity of host 'master (240.38.197.11)' can't be established.  
ECDSA key fingerprint is 8c:ad:98:0f:2f:68:6a:32:49:01:59:96:e0:65:92:a8.  
Are you sure you want to continue connecting (yes/no)? yes  
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
master's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'master'"  
and check to make sure that only the key(s) you wanted were added.  

Don't forget to replace master and slave with the hostname of your servers or the FQDN of your Terminals.

Pro tip: You can add the master / slave server data in the /etc/hosts file, aliasing their hostnames with master and slave respectively. By doing that, you will be able to follow these tutorial steps nearly exactly.

For example:

240.38.197.11 qmaxquique3196 master  
240.38.197.12 qmaxquique3197 slave  

4. Configuring the master server

First, create a new user in PostgreSQL, for replication purposes. We'll use it to connect to the master instance from the slave and replicate data. To create the new user, execute:

psql -c "CREATE USER rep REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'password';"  

The user created is called rep. Make sure to replace the word password with a good password...


Now, let's start the master instance configuration files.

Edit the pg_hba.conf file. It controls the client authentication for PostgreSQL. This file is usually located at /etc/postgresql/9.3/main/pg_hba.conf.

Add the line in below at the end of the file, modifying the IP-of-the-slave with the real IP of the slave server. If you have it configured in the /etc/hosts file, you can add the hostname or an alias as well.

host    replication     rep     IP-of-the-Slave/32  md5  

Then, edit the main PostgreSQL config file, usually located at

/etc/postgresql/9.3/main/postgresql.conf

Comment out / modify the file according the lines below:

listen_addresses = 'localhost,0.0.0.0'  
wal_level = 'hot_standby'  
archive_mode = on  
archive_command = 'cd .'  
max_wal_senders = 1  
hot_standby = on  


To apply the changes, restart PostgreSQL.

/etc/init.d/postgresql restart

5. Configuring the slave server

First, let's log in to the slave server and stop PostgreSQL there.

/etc/init.d/postgresql stop



With the PostgreSQL service already down, we can modify the configurations files. Edit the /etc/postgresql/9.3/main/pg_hba.conf file by adding the line below:

host    replication     rep     IP-of-the-Master/32  md5  

Make sure to replace IP-of-the-master with the IP of the master instance.


Edit the file /etc/postgresql/9.3/main/postgresql.conf and comment out and modify the file according the lines below:

listen_addresses = 'localhost,0.0.0.0'  
wal_level = 'hot_standby'  
archive_mode = on  
archive_command = 'cd .'  
max_wal_senders = 1  
hot_standby = on  

6. Initial replication

Before the slave server can replicate from the master, it's recommended to transfer the initial data structure.

Go to the master server and dump a backup file.

psql -c "select pg_start_backup('initial_backup');"  


Then, copy the dump to the slave server, except for the xlogs files.

rsync -cva --inplace --exclude=*pg_xlog* /var/lib/postgresql/9.3/main/ slave_IP:/var/lib/postgresql/9.3/main/  
psql -c "select pg_stop_backup();"  

Additionally, the command psql -c "select pg_stop_backup();" will do the backup cleanup.



Log in to the slave server again and configure a recovery file. Replace the contents of /var/lib/postgresql/9.3/main/recovery.conf with the lines below:

standby_mode = 'on'  
primary_conninfo = 'host=master-IP port=5432 user=rep password=password'  
trigger_file = '/tmp/postgresql.trigger.5432'  

Replace the host, port, user and password accordingly.

Note: Later, if you create a trigger_file on your slave machine, your slave will reconfigure itself to act as a master.



To apply the changes, restart PostgreSQL in the slave server.

/etc/init.d/postgresql restart

At this point the replication should be already working. If not, check the log file /var/log/postgresql/postgresql-9.3-main.log for more details.


Backing up PostgreSQL

This section provides a quick overview of some tools that are designed to be used for PostgreSQL backups.

1. Using pg_dump

pg_dump is a basic but powerful tool to backup PostgreSQL databases. This tool generates a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump.

Backup

pg_dump examples:

pg_dump database-name > database-name.bck  

The command has more options, similar to a PostgreSQL client. For example:

pg_dump -u username -h host -p port database-name > database-name.bck  
Restore

To restore data dumped using pg_dump we will use the psql client directly. The database must already exist in PostgreSQL and, if your DB has tables owned by another users, these users must be created as well.

For instance:

createdb -T template0 database-name  
psql database-name < database-name  
createuser myuser  

2. Using pg_dumpall

Generally pg_dumpall is used to dump the content of all databases of a PostgreSQL instance.

Backup

The basic usage of pg_dumpall is:

pg_dumpall > backup_file  
Restore

The restore process is basically the same as with a dump done with pg_dump

psql -f backup_file postgres  

3. Other methods

If you're looking for something more specialized, you can also check out these options:


Final notes

If you have any questions, suggestions, or other feedback about the topics presented please leave a comment below.

Take care of your data. Good luck!

]]>
<![CDATA[Getting started with Upstart]]>http://blog.terminal.com/getting-started-with-upstart/e663355e-7fcd-4be6-b1b1-7e012aa73246Tue, 18 Aug 2015 17:50:24 GMT

The server boot-up and services handling management process are essential parts of a Unix-based system. The software in charge of such things handles and control the operation of every system script and service.

In any server systems environment, problems can occur at certain points of startup and shutdown. During those periods, the speed is also a priority.

During the last year, many major distributions has adopted systemd as an all-in-one replacement for system startup management tools, but the design of systemd generated significant controversy within the free software community. Critics argue that systemd's architecture violates the Unix philosophy and that it will eventually form a system of interlocking dependencies.

As Upstart is still used in Ubuntu LTS (long term support) like the one you can find at Terminal, I will provide a simple but useful introduction about how to use and how to create your own Upstart service handling scripts.


What's Upstart?

Upstart is an event-based service handling utility. It manages the services during the system booting and shut-down. Additionally, it also monitors them while they're running.

Designed to overcome some limitations in system V and other dependency based init systems, Upstart has proven to work very well in Ubuntu, RHEL6 based distributions and ChromeOS.

As Upstart has been aim to be completely event based, there are some significant conceptual differences between it and other init based systems.

  • Task jobs: Simple working processes with a certain purpose.

  • Service jobs: Commonly called services, service jobs are working processes designed to run in background. Additionally, there are also service jobs called abstract jobs which are supposed to run in background forever unless they're stopped by root (or a user with enough privileges)

  • Events: Events or "calls" are basically occurrences used to trigger a certain action. The common forms of events refer to the monitoring of a process or a user-generated signal.


Writing Upstart jobs

At this point, I recommend to make sure that you have somewhere to play with Upstart. Creating an Ubuntu Terminal could be a good idea as we've everything we need to follow this tutorial.

From the beginning

We will start by creating a basic Upstart job.

Let's create a new file called test.conf, located at /etc/init/ and put something like this inside it.

description "This is a test Job"  
author "John Rambo"

start on runlevel [2345]  
exec echo "Rambo was here at $(date)" >> /var/log/test.log  

Explanation: This simple Upstart config file will add a new line to a file called /var/log/test.log. It will start at runlevels 2,3,4 or 5 executing the command echo "Rambo was here at $(date)" >> /var/log/test.log. Easy, right?

Testing: To avoid any issues, you might want to make sure that the syntax of our new test.conf file is OK. In order to do that, we will use the init-checkconf command.

init-checkconf /etc/init/test.conf  
File /etc/init/test.conf: syntax ok  

Now, we can use the service command to launch our job.

service test status  
test stop/waiting  

Initially, we notice that the job is actually stopped.

service test start  
test start/running, process 696  
service test status  
test stop/waiting  

Then, we start the job, but as this is just task job, it goes down as soon it's finished.

cat /var/log/test.log  
Rambo was here at Mon Aug 17 13:56:34 EDT 2015  

Finally, we see that the job was executed one time as we wanted.


Using service

The service command is used to execute System V init scripts or Upstart jobs in a predictable environment as possible, removing most environment variables and with the current working directory set to / (root dir).

Syntax:

sudo service <servicename> <control>  

For Upstart jobs, the service command can send several 'controls' like start, stop, restart, status, etc


Writing a service job

As we were presenting before, a service job refers to a working process that's designed to run in background permanently.

This time, we will study a real-life example that I had to write last week: An Upstart job to keep iPython Notebook 4 (Jupyter) working in server mode, as a service.

Let's take a look to the config file:

description "iPython Notebook Jupyter Upstart script"  
author "Enrique Conci"

start on filesystem or runlevel [2345]  
stop on shutdown

script  
    export HOME="/root"; cd $HOME
    echo $$ > /var/run/ipython_start.pid
    exec jupyter-notebook --config='/root/.ipython/profile_nbserver/ipython_config.py'
end script

pre-start script  
    echo "[`date`] Starting iPython Notebook (Jupyter) Server" >> /var/log/ipython-ntb.log
end script

pre-stop script  
    rm /var/run/ipython_start.pid
    echo "[`date`] Stopping iPython Notebook (Jupyter)" >> /var/log/ipython-ntb.log
end script  

As you can see, this time the script is a little more interesting. We've added the script, pre-start script and pre-stop script sections. These sections are commonly called stanzas.

The Upstart stanzas are intended to contain scripts to be executed at certain point of the service lifetime. All stanzas must finish with the end script statement.

On this case, the script stanza will setup some variables and start the jupyter-notebook process passing the config file required as its parameter.

The pre-start script stanza will be executed before the script. It will log the start action.

The pre-stop script stanza will clean up the PID file and log the action.

Note the we've also added the stop on shutdown statement to our script, to allow Upstart to shutdown the process gracefully when shutting down the server.

You can run standard start, stop, restart, etc. commands for this service, or any other similar Upstart jobs, with syntax explained in the using service section.


Final notes

If you ever need some help don't hesitate to leave a comment or check a good resource like the Upstart Cookbook from Ubuntu.

We've just explored the mere surface of the Upstart world but from here you can start writing your own Upstart startup scripts by your own.

Now go out there and write some good startup scripts!

]]>
<![CDATA[MySQL replication and backup methods]]>http://blog.terminal.com/mysql-replication-and-backup-methods-in-terminals/7d8fc438-433e-4246-86ee-4b0f2f7ce852Thu, 06 Aug 2015 17:12:43 GMT


Setting up a typical MySQL replicated environment with Terminals

There are several ways to set up database replication with MySQL. The most popular is the master-slave setup.

This replication process allows you to keep multiple copies of your MySQL data by synchronizing it automatically from a master to a slave database server. This lets you keep a backup database server that you can promote to master in case of issues, or use to facilitate backup activities.


1. Install MySQL

For this tutorial, we will be using a base Ubuntu snapshot.

First, we will create two new Terminals and install MySQL on them:

apt-get update  
apt-get install mysql-server  

Take note of the MySQL root passwords, as you will need them later.


2. Set up non-password access

(Non-mandatory step)

To avoid having to write your root password every time you access the MySQL user prompt, create a new .my.cnf file in the root's home directory with your password. Do the same in both Terminals.

cat > .my.cnf  
[client]
user=root  
password=yourpassword

# Control+D to finish

chmod 600 .my.cnf  

For security reasons, only root can read the password.


3. Link Terminals to each other

This step is needed when you're working in Terminals.

MySQL replication requires the servers to communicate with each other. To link the Terminals we will use the tlinks command line tool. tlinks is located in our terminal-tool GitHub repo.

For the scope of this post, we will just explain the commands used to install tlinks in Ubuntu for MySQL replication. For more information on how to install tlinks, check out this blog post.

apt-get -y install python-pip wget  
pip install terminalcloud  
wget https://raw.githubusercontent.com/terminalcloud/terminal-tools/master/tlinks.py  
chmod +x tlinks.py

./tlinks.py -u <myusertoken> -a <myaccestoken> show <anyterminal>

The last command shown is used to configure tlinks with your API tokens for the first time.

Now we can link our Terminals, allowing connections between ports 22 and 3306.

./tlinks.py link <master_terminal_subdomain> -s <slave_terminal_subdomain> -p 22,3306
./tlinks.py link <slave_terminal_subdomain> -s <master_terminal_subdomain> -p 22,3306

This will allow for MySQL and ssh connections between your Terminals in both directions.

You can check if the links were created correctly by using the show action.

./tlinks.py show <master_terminal_subdomain>
./tlinks.py show <slave_terminal_subdomain>

4. Configure the master instance

To configure the master instance you will have to modify the MySQL config file, usually located at /etc/mysql/my.cnf.

Open the my.cnf file, locate the settings listed below, and change them as follows:

bind-address = 0.0.0.0  
server-id = 1  
log-bin = /var/log/mysql/mysql-bin  
binlog-ignore-db = “mysql”  

You will have to uncomment the server-id and log-bin lines, and add the binlog-ignore-db = “mysql” to your default config file.

Explanation notes:

  • bind-address sets the IP address of the interface that MySQL listens to. It's set to 127.0.0.0 by default so only allows connection from the current host, but we need to set it to allow for connection from the SLAVE host. 0.0.0.0 means that MySQL will listen to any interface. This is useful when you have servers in exposed environments and you want to allow MySQL traffic only through a specific network interface.

  • server-id is an integer assigned to each MySQL server in a replication schema; each server must have a unique integer. Usually, it can be assigned in sequence.

  • log-bin sets where the MySQL binary log will be saved.

  • binlog-ignore-db setting prevents the replication of the mysql system database (users and grants will not be replicated).


5. Set up the replication user in the master instance

To complete this step, you will need the subdomain of the Terminal that you are using as slave. For example, if the Terminal URL is dan123.terminal.com, your Terminal subdomain is dan123. Make sure that you're able to get your Terminal subdomain IP by using the host command.

Access the MySQL prompt by executing mysql. If you did not add your password in the .my.cnf file, you will have to execute mysql -p and provide the password every time.

In the MySQL prompt:

mysql> grant replication slave on *.* TO 'replication'@'<SLAVE>' identified by '<some_password>';  
mysql> flush privileges;  
mysql> quit  

This will create a user called replication, assign a password, and grant it replication permissions. Note that we're only giving this user access from the slave server.

To test if this is configured correctly, log in to the slave server then try to log in to the master MySQL instance as the replication user by executing:

mysql -h<MASTER> -ureplication -p  

If everything is correctly configured, you will get the MySQL command prompt.


6. Obtain the replication master binary log coordinates

To configure replication on the slave, you need to obtain the master binary log position. In order to get that information, log in to the MySQL command prompt in your master host and execute:

mysql> FLUSH TABLES WITH READ LOCK;  
mysql > SHOW MASTER STATUS;  
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |      345 |              | mysql            |
+------------------+----------+--------------+------------------+

The position in this example is 345.


7. Configure the slave instance

As with the master instance, we must also configure the MySQL configuration file in the slave instance, usually located at /etc/mysql/my.cnf.

Locate the bind-address and server-id settings and modify them as in this example:

bind-address = 0.0.0.0  
server-id = 2  



Now, let's configure the master details at the slave MySQL command prompt.

mysql>stop slave;  
mysql>CHANGE MASTER TO MASTER_HOST='<MASTER>', MASTER_USER='replication', MASTER_PASSWORD='<replication user password>', MASTER_LOG_FILE='mysql-bin.0000001', MASTER_LOG_POS=345;  
mysql>start slave;  

Explanation Note:

  • The server-id value is different from the master server.
  • We used the MASTER_LOG_FILE and MASTER_LOG_POS obtained in the previous steps.

Side Note:

In MySQL versions up to 5.5 , the master MySQL credentials are stored as plain text in the my.cnf file as follows:

master-host =  [MASTER_IP]  
master-user = replication  
master-password = [replication_password]  
master  

8. Test

If everything went well, your slave instance should be already replicating data from your master server.

To test this, log in to your slave MySQL instance and execute show slave status \G.

The output should look like this:

mysql> show slave status \G  
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: qmaxquique123
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 515
               Relay_Log_File: mysqld-relay-bin.000002
                Relay_Log_Pos: 661
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 515
              Relay_Log_Space: 818
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1



Explanation Note:

  • Look at Slave_IO_Running: Yes and Slave_SQL_Running: Yes first. If both are Yes, your slave server is running both replication workers.
  • If something went wrong, you will find a short description of the problem in the last_SQL_Error field.

If you have any issues or questions about applying this procedure, please leave a comment and I will try to help!


Backing up MySQL databases

It's trivial to talk about how important are backups in the IT world. If you think they're not, well... think about it again!

In the next section, I will show some ways to backup database instances.


Suggestion 1: mysqldump

mysqldump is a logical backup tool included in both the community and enterprise editions of MySQL. It's a simple and direct method to obtain reusable data dumps from your MySQL databases, supporting backups from all storage engines.

Backup

The mysqldump command line utility can be used to backup local or remote databases. Check out these examples to learn more:

mysqldump -uroot -hmydbserver --all-databases -p > dump-$( date '+%Y-%m-%d_%H-%M-%S' ).sql  

This command will connect to a remote DB instance running on mydbserver as root and will dump all databases to a file named the current date and time.

You can also use mysqldump to get data from only one database:

mysqldump -ujohn -p john_db > john_db.sql  

This time we didn't specify a server. By default mysqldump will connect to a database instance running on localhost and dump a database called john_db.

Restore

Restoring a database dump taken with mysqldump is quite straightforward. You just need to pass the file as STDIN to the mysql command line client. For example:

mysql -uroot -hmydbserver -p < backupFile.sql  

As you can see, this command restores the contents of the backupFile.sql file to the MySQL database instance running on mydbserver.


Suggestion 2: Copy datafiles the hard way

This backup method might not be good for all situations as it requires you to shut down your MySQL instance to backup the physical datafiles as if they were any other kind of file.

This method is best for migrating an entire database instance.

Backup

Consider this example:

/etc/init.d/mysqld stop
d_name=$(date '+%Y-%m-%d_%H-%M')  
mkdir -p "/db_backups/"  
/etc/init.d/mysqld start
tar cjf "/db_backups/$d_name.tar.bz2" /var/lib/mysql  

Note that this method involves a DB outage and, depending on the size of your datafile, it might take anywhere from just seconds to several hours.

Restore

Restoring the backup will overwrite the entire instance.
Make sure that you know what you're doing!

/etc/init.d/mysqld stop
tar xjf /db_backups/2015-07-27_10-11.tar.bz2 -C /  
/etc/init.d/mysqld start

Again, this will overwrite all data and including all user passwords.

Sidenote: If you need to restore the root password, execute:

/etc/init.d/mysql stop
mysqld_safe --skip-grant-tables --skip-networking &  
mysql -u root -p  

Then, in the MySQL prompt:

mysql> use mysql;  
mysql> update user set password=PASSWORD("NEWPASSWORD") where user="root";  
mysql> flush privileges;  
mysql> Ctrl+D  

And back in the shell:

/etc/init.d/mysql restart

Suggestion 3: AutoMySQLBackup

AutoMySQLBackup has some great features for backing up a single database, multiple databases, or all the databases on the server.

Each database is saved in a separate file that can be compressed with gzip or bzip2.

AutoMySQLBackup will rotate the backups and prevent them from filling up your hard drive. The daily backup gives you the last 7 days of backups, or you can enable weekly backups to have one for each week.

Installation

AutoMySQLBackup is available in most of the Linux distributions. You should be able to install it in Ubuntu/Debian and CentOS/RHEL by executing:

apt-get install automysqlbackup || yum install automysqlbackup  

You can also download the AutoMySQLBackup script directly from the project page.

Configuration

As AutoMySQLBackup is a shell script, it can be configured and modified directly.

To configure AutoMySQLBackup, open the script with your preferred text editor, locate the parameters to be changed, and modify them as in the example below:

USERNAME=dbuser  
PASSWORD=password  
DBHOST=localhost  
DBNAMES="DB1 DB2 DB3"  
BACKUPDIR="/backups"  

Finally, add automysqlbackup to the root's daily crontab to make sure you're backing up your base every day.


Suggestion 4: Percona Xtrabackup

Percona XtraBackup is a free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

Basic backup (all databases)

In this post we will only explore some basic usage examples of this tool.

This is a typical example of how to use Xtrabackup to backup an entire DB instance:

d_date=$(date '+%Y-%m-%d_%H-%M')  
innobackupex /data/backups  
mkdir -p "/backups/$d_date"  
innobackupex --use-memory=4G --apply-log "/backups/$d_date"  

That will create a backup of all your datafiles in your datadir as it's specified in the my.cnf file.

Restore

This will restore the files from the datadir as they were saved. The datadir MUST be empty and MySQL MUST be down.

innobackupex --copy-back /data/backups/<backup dir to be restored>/  

After the restore is complete, make sure to change the permissions as needed and start MySQL again.

chown -R mysql:mysql /var/lib/mysql  

Final notes

If you have any questions, suggestions, or other feedback about the topics presented please leave a comment below.

Take care of your data. Good luck!

]]>
<![CDATA[Quick tip: mounting a Terminal in your Mac]]>http://blog.terminal.com/mounting-a-terminal-in-your-mac/a8981ac3-0f5a-4d82-9e5a-888f90eb810cMon, 03 Aug 2015 17:31:43 GMT


Ever wanted to access your Terminal files locally? In this post, we'll show you how to use SSHFS over Fuse to mount any directory on a Terminal into a directory on your local OSX computer. SSHFS relies on the SSH protocol to access the files in your Terminal.

To do this, first make sure that SSH is configured correctly on your machine to use the Terminal proxy.


Mounting a Terminal on a local directory

1. Install the components needed

Get the latest stable OSX Fuse and SSHFS installers. At the moment of writing this post, the latest version for OSXFUSE is 2.7.5 and the latest version for SSHFS is 2.5.0.

2. Mount a directory

First, make sure you have SSH access to your Terminal.

If you want to spin up a new Terminal for this purpose, you can also select your SSH key there.

This saves you some time when mounting the FUSE filesystem if you have a non password protected key.

Once you have both tools installed and have the Terminal you want to connect to up and running, open a new Terminal in your Mac computer and use the sshfs command to mount it wherever you want:

~ enrique$ mkdir remote
~ enrique$ sshfs root@qmaxquique3195.terminal.com: remote
The authenticity of host 'qmaxquique3195.terminal.com (<no hostip for proxy command>)' can't be established.  
RSA key fingerprint is 3f:28:2d:b9:0a:ce:18:2f:a4:37:3a:53:9e:8c:f9:36.  
Are you sure you want to continue connecting (yes/no)? yes  
~ enrique$ cd remote
~/remote enrique$ ls -ltr
total 0  
-rw-r--r--  1 enrique  staff  0 Jul 31 12:52 newfile

If you open Finder, you will see a new icon called OSXFUSE Volume1 (sshfs) in your home directory. You can now use that directory like any other kind of network shared folder.

You can also mount the whole Terminal root filesystem in the same way. For example:

~ enrique$ mkdir remote2
~ enrique$ sshfs qmaxquique3195.terminal.com:/ remote2
~ enrique$ cd remote2
~/remote2 enrique$ ls -ltr
total 208  
drwxr-xr-x  1 enrique  staff   4096 Oct 31  2013 public  
drwx------  1 enrique  staff  16384 Jul 21  2014 lost+found  
-rw-r--r--  1 enrique  staff      0 Jul 21  2014 fastboot
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 opt  
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 mnt  
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 media  
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 boot  
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 var  
drwxr-xr-x  1 enrique  staff   4096 Jul 21  2014 usr  
drwxr-xr-x  1 enrique  staff   4096 Jul 23  2014 lib  
drwxr-xr-x  1 enrique  staff   4096 Jul 23  2014 uploads  
drwxr-xr-x  1 enrique  staff   4096 Jul 23  2014 home  
drwxr-xr-x  1 enrique  staff   4096 May 15 14:39 lib64  
drwxr-xr-x  1 enrique  staff   4096 Jun 22 09:14 srv  
drwxr-xr-x  1 enrique  staff   4096 Jun 22 09:18 sbin  
drwxr-xr-x  1 enrique  staff   4096 Jun 22 09:18 bin  
drwxr-xr-x  1 enrique  staff      0 Jul 31 12:49 sys  
dr-xr-xr-x  1 enrique  staff      0 Jul 31 12:49 proc  
drwxr-xr-x  1 enrique  staff   4096 Jul 31 12:49 gfshome  
drwxr-xr-x  1 enrique  staff   4096 Jul 31 12:50 etc  
drwxr-xr-x  1 enrique  staff    680 Jul 31 12:50 dev  
drwxr-xr-x  1 enrique  staff   4096 Jul 31 12:50 CL  
drwxr-xr-x  1 enrique  staff   4096 Jul 31 12:50 local  
drwxrwxrwt  1 enrique  staff   4096 Jul 31 13:01 tmp  
drwx------  1 enrique  staff   4096 Jul 31 13:02 root  
drwxr-xr-x  1 enrique  staff    460 Jul 31 13:07 run  

In Finder, a new OSXFUSE folder icon will appear, and inside it, the contents of your Terminal's root filesystem.


Notes about performance

SSHFS is intended to work using a secure protocol (SSH). The SSH protocol encryption will introduce a significant degradation in performance.

Additionally, a remotely-mounted directory's performance is directly affected by your internet speed, the distance between you and the Terminals servers, and other network conditions.


Final notes

Mounting a Terminal directory directly on your computer might help you work with local IDEs or text editing on your local computer. Please remember the notes above about performance when working with a mounted folder, as this feature is not feasible in all networking scenarios.

Don't by shy when you're testing this out, let us know if it works for you in the comments section!

]]>
<![CDATA[Stats on Ubuntu: getting an up-to-date R environment]]>http://blog.terminal.com/getting-an-up-to-date-r-and-rstudio-installation-on-ubuntu/e7d8b4f8-e216-46cf-be50-f56d0b93a62bTue, 28 Jul 2015 18:38:59 GMTUbuntu is a great environment for data scientists. The out-of-the box repositories have more recent software than you get in Red-Hat distros, so apt-cache search and apt-get install usually take care of the binary dependencies for data science tools.

But unlike the C libraries we link against, our core tools need to be very current. For example, on Ubuntu 14.04, the default 'R' available with apt is version3.02, which is old enough that it doesn't support dplyr. This won't do.

I'm going to show you the scripts I use to get an up-to-date install of R. We will also install Rstudio-server, and a few R packages I find essential.

1. Install binary dependencies

Although the Ubuntu repository's version of R is too old, the libraries we need are all recent enough. To install them:

apt-get update  
apt-get install -y libcurl4-openssl-dev  
apt-get install -y libgstreamer-plugins-base0.10-0  
apt-get install -y gdebi-core  
apt-get install -y libapparmor1  
apt-get install -y  libxml2-dev  
apt-get install -y libcurl4-gnutls-dev  

You likely need to run this with sudo if you aren't on Terminal.com. On Terminal you are already root.

2. Install R

The best way to get an up-to-date version of R is to install it using an Ubuntu repository hosted on one of the CRAN mirrors. Here's how to do it using the Berkeley mirror1:

echo 'deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu trusty/' >> /etc/apt/sources.list  
apt-get update  
apt-get install -y --force-yes r-base r-base-dev  

3. Install essential R packages

Our next step is to install some essential packages from CRAN.

Set a default mirror

By default, R makes to choose a mirror every session that you install CRAN packages. The first step I take after a fresh install is to set a default CRAN mirror.

cat >> /etc/R/Rprofile.site << EOF  
local({  
  # add MASS to the default packages, set a CRAN mirror
  old <- getOption("defaultPackages"); r <- getOption("repos")
  r["CRAN"] <- "http://cran.rstudio.com"
  options(defaultPackages = c(old, "MASS"), repos = r)
})
EOF  

I use RStudio's mirror because it supports https. They also use CloudFront to distribute their service, so performance should be pretty good anywhere in the world.

Install packages

You can install packages from the shell, using R CMD install. But since it's standard practice to install from inside R using install.packages, I prefer to do my installs in an R script. You can paste the following code into R interactively, or put it in a file install-packages.R and then run R -f install-packages.R from the shell:

# basic development packages
install.packages("devtools")  
install.packages("roxygen2")  
install.packages("testthat")  
install.packages("knitr")

# key packages data wrangling and visualization
install.packages("dplyr")  
install.packages("tidyr")  
install.packages("plyr")  
install.packages("stringr")  
install.packages("ggplot2")  

You can see that my tastes run toward using Hadley Wickham's toolkit for both development and data processing. Feel free to replace these installs with whichever packages you find essential.

4. Install RStudio server

I recommend having a gui IDE for data science, which makes plotting easy. The most popular one by far for R is RStudio. We will install the RStudio-server version, which runs in the browser and is great for working in the cloud:

wget http://download2.rstudio.org/rstudio-server-0.99.467-amd64.deb  
gdebi -n rstudio-server-0.99.467-amd64.deb  

Once this runs2, you'll have an instance of RStudio running on 8787; you can log in using any valid non-root username on your system. On Terminal.com, you are a root user by default on your terminals, so you'll need to create a user. I'll describe how to do this in the next section.

You can start and stop the server from the command line using rstudio-server start and rstudio-server stop.

5. Run RStudio server

Create a new user

On Terminal.com, you are root by default. This is great for setting up your environment, but RStudio doesn't allow remote login for the root user, so we need to create a new one:

useradd rstudio  
mkdir /home/rstudio  
chown rstudio:rstudio /home/rstudio  
passwd rstudio <<EOF  
rstudio  
rstudio  
EOF  

To see your shiny new RStudio installation, just go to <your-hostname>-8787.terminal.com. For example, I tested out these scripts on stroxler37.terminal.com, so to work on RStudio, I go to stroxler-8787.terminal.com and log in as rstudio with password rstudio.

Extra steps if you are not on Terminal.com

If you install RStudio on another server, you'll need to open port 8787 to the internet. Also make sure the user you create has a secure password. (Terminal.com handles security for you, since only you can see web services running on your own machines unless you explicitly say otherwise.) To see RStudio, you'll navigate to <your-ip-or-hostname>:8787 in the browser.

Find this code on Github

If you'd like this code in script form, you can find it in the ubuntu/R directory of ape2, my github repo for setup scripts.

Happy hypothesis testing!

About me

I'm Steven Troxler, also known as Trox. I'm an engineer at Terminal.com with a background in math and statistics.

Notes

  1. The '--force-yes' option is considered unsafe in general, but as long as you trust Berkeley's CRAN mirror, you don't need to worry. If you are concerned, it's possible to add the necessary credentials to apt. The CRAN docs on installing R indicate that 'sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E084DAB9' should do the trick.

  2. Some people consider the -n option for gdebi insecure. If you trust RStudio's .deb file it isn't a problem, but feel free to omit it. You'll be prompted before the install.

]]>