Open Software for Restricted Data: an Environmental Epidemiology example
Table of Contents
- 1 Copyright
- 2 Abstract
- 3 Introduction /introduction.html
- 4 Deploying Virtual Machines
- 5 The Brawn
- 6 The Brains
- 6.1 R Server
- 6.1.1 R
- 6.1.2 package management and R updates
- 6.1.3 Rstudio
- 6.1.4 firewall access
- 6.1.5 SSL/HHTPS and running a proxy server
- 6.1.6 git
- 6.1.7 ssh for github
- 6.1.8 gdal
- 6.1.9 geos
- 6.1.10 or under ubuntu
- 6.1.11 test readOGR
- 6.1.12 rgraphviz
- 6.1.13 test
- 6.1.14 install just the postgres bits required for RPostgreSQL package
- 6.1.15 postgis utilities
- 6.1.16 unixODBC
- 6.2 Test the Backups of this Minimal R Sever.
- 6.3 Oracle XE Permissions and Users System
- 6.3.1 backup local ubuntu version
- 6.3.2 INIT
- 6.3.3 SWAP
- 6.3.4 DOWNLOAD AND SCP
- 6.3.5 Install the database
- 6.3.6 import application and set Security
- 6.3.7 explain the table creation script
- 6.3.8 set up R, RJDBC and ROracle
- 6.3.9 maintenance
- 6.3.10 unlock
- 6.3.11 unlock header
- 6.3.12 extending/revising the database to track servers and files
- 6.4 DDIindex
- 6.5 Set up a private git server for data and code
- 6.6 True-Crypt encrypted volumes
- 6.7 ResearchData storage
- 6.8 example workflow to dev a local data package and publish to shared
- 6.9 Share with windows network
- 6.1 R Server
- 7 Procedures for add users to the system.
- 8 Backups
- 8.1 General
- 8.2 General concerns
- 8.3 Brains
- 8.4 Brawn
- 8.4.1 Find out how big is it?
- 8.4.2 Dump and download it to a secure computer
- 8.4.3 or simple home backup
- 8.4.4 Restore databse dump into a new database on another machine.
- 8.4.5 launch a new Nectar VM from the snapshot image
- 8.4.6 mount the 2nd disc and load/restore the postgres db and data into it
- 8.5 Disaster Recovery Plan
- 9 Examples
1 Copyright
Copyright is the GNU General Public License version 2 WITHOUT ANY WARRANTY, Ivan C Hanigan <ivan.hanigan@gmail.com>
2 Abstract
Ivan Hanigan 1, Steven McEachern 2, David Fisher 3.
- 1 National Centre for Epidemiology and Population Health, Australian National University
- 2 Australian Data Archive, Australian National University
- 3 Information Technology Services, Australian National University
Last Updated 2nd December 2015
Increasing concerns over privacy in Australia and globally, combined with the risk from hacking and the accidental release of large-scale data sets is leading to increased restrictions on the use of confidential, highly sensitive health data. This is coincident with increased statistical and computational power, with the potential to glean many new insights from already collected data. Unfortunately, however, the two trends risk cancelling each other out. It is thus imperative that universities and other institutions who have access to large data sets manage them in ways that maintain organisational and public confidence in their integrity.
This paper presents the design and development of a Virtual Laboratory for analysing restricted data using open software. These tools were assembled with the aim to allow users to access restricted data in an appropriate and safe manner whilst allowing use of open software to enhance reproducibility and accessibility. The system implementation is described specifically for the Australian National Research Cloud http://www.nectar.org.au/research-cloud/.
We present a case study of an application from Environmental Epidemiology using confidential health records, which was a motivating reason for us to develop this system. In the example presented here, we provide a simple analysis of the distribution of suicides with drought across NSW and also with votes for the conservative parties; both of which have previously been found to increase the risk of suicides in NSW. The paper then concludes with a reflection on the implications of applying these open software tools to restricted access data such as the Australian Deaths dataset.
3 Introduction /introduction.html
We present an environment for analysing restricted data using open software. The system is described using an analysis of the historical association of suicides with drought in Australia; and extrapolate this under climate change and adaptation scenarios. These tools were assembled to allow users to access restricted data in a manner that protects confidentiality of sensitive data, whilst also allowing use of open software for reproducibility. Recently restrictions on access to confidential health records have increased, especially for sensitive data on suicide used in our case study. Previous solutions to this challenge make access so restricted that usability is compromised. We aimed to build a collection of tools for the conduct of many types of health and social science research. The starting point for users is the data catalogue, which provides for finding data available from the store of unrestricted and restricted data for approved use. Once data are discovered, the researcher has capacity to manipulate the datasets on the secure server. The PostgreSQL database integrates and Geoserver visualises, while statistical tools are available in the R-studio server browser. Such analytical tools will enhance the ability of adaptive management practitioners to assess the potential influence of adaptations. The use of the system shows the ease with which multiple data sources (some restricted) can be analysed in a secure way using open software. This will build capacity to answer complex research questions and compare multiple climate change scenarios or adaptation assumptions; achieving simultaneous vision of potential future outcomes from different standpoints.
4 Deploying Virtual Machines
4.1 Description
We want to deploy the service as a symbiotic pair of virtual images that together will provide integrated data storage and analysis services in the cloud. Our recent review of the system prototype I developed suggested a need for two servers to comprise this system; to provide a more robust and powerful combination of structural and functional aspects.
The services hosted by the pair of servers will be:
- The Brains: a Statistical Analysis engine (running the R environment) integrated with a metadata registry, and
- The Brawn: a PostgreSQL Geographical Information System (GIS) Database server.
4.2 Accessing Virtual Machines on the Australian Research Cloud
4.2.1 Launch an instance
- Log on to the research cloud using your Australian University details http://www.nectar.org.au/research-cloud/.
- create two security groups for brawn and brains, this will be the special firewall settings for each
- add ssh port 22 (add http(s): 80 etc, postgres 5432, icmp -1 as needed)
- Under images and snapshots launch the centos 6.2 amd64 image
- give your server a name, add security group
- specify a ssh keypair http://support.rc.nectar.org.au/demos/launching_123.html
- launch and note the ip address
- go to the access and security section and edit the rules (ie for http add a rule to allow access from port 8080 to 9000 with 0.0.0.0/0 (CIDR))
- Note that this allows access to the whole world, we will think about securing the server later
4.2.2 security-groups header
4.2.3 requesting help
To: rc-support@nectar.org.au Subject: Request about snapshot status
Hi,
My AAF email I login to the cloud with is: ivan.hanigan@anu.edu.au
What is currently happening: I took a snapshot of one VM and it said success but now in the status column it has 'shutoff' and a spinning wheel.
What should happen: It should return to status 'Active'.
Cheers, Ivan Hanigan Data Management Officer. Thursday PhD scholar (ANU/CSIRO). National Centre for Epidemiology and Population Health. College of Medicine, Biology and Environment. Australian National University Canberra, ACT, 0200. Ph: +61 2 6125 7767. Fax: +61 2 6125 0740. Mob: 0428 265 976. CRICOS provider #00120C.
4.2.4 connect using ssh
If using Windows make sure you have git or WinSCP installed. Opening a bash shell from these will enable you to connect to your server using ssh. Then the contents of the following orgmode chunks can be evaluated on the remote server. If your on linux then orgmode can execute these chunks using C-c C-c.
whoami
In the next chunk, insert the relevant ip address and you may have to answer yes to the question about adding this RSA fingerprint to your list.
cd ~/.ssh ssh -i keypairname root@your.new.ip.address # it is prudent to set a hideously long password for root # passwd root
4.3 Setting up the basic server framework
This is done in a similar way for both the Brains and the Brawn servers.
4.3.1 Install any updates using the yum package manager
This is recommended to do every week to maintain the server in good condition, especially regarding security software.
yum update
4.4 Security measures
The following section sets some restrictions on the server. I would like to know how important it is to restrict root login and also if we can permit login via ssh and port 22 if it is only open to the NCEPH VPN range? If so I just leave the below as yes yes yes? I had to get a bit of advice from a sysadmin at work about the following but I am sure it is still pretty unsecure. There are various TODOs hidden in the source code document.
################################################################ # name:security #visit /etc/ssh/sshd_config #under authentication remark out #RSAAuthentication yes #PubkeyAuthentication yes #AuthorizedKeysFile .ssh/authorized_keys
4.4.1 TODO add a ssh key
- get the private/public key generated
- insert the public key
4.4.2 Restrict ssh access to a specific ip address/range
4.5 Setting the host.* TCP Wrappers
4.5.1 Setting the host.* TCP Wrappers header
4.5.2 Security Enhanced Linux (selinux)
To run the Database and the Rstudio server it is best to disable the selinux.
- TODO Find out if this is necessary for PostgreSQL as well as Rstudio.
# selinux config vi /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=enforcing # Change SELINUX=enforcing to disabled
4.5.3 Install some base packages
There are a few commonly used packages recommended for both Brawn and Brains.
yum install gcc-gfortran gcc-c++ readline-devel libpng-devel libX11-devel libXt-devel texinfo-tex.x86_64 tetex-dvips docbook-utils-pdf cairo-devel java-1.6.0-openjdk-devel libxml2-devel make unzip hdparm
4.6 Hardware set-up
Now we will note the size and number of the disc partitions.
df -h
4.6.1 Swap space /swapon.html
--- name: swapon layout: default title: Swap space --- For some reason the Research Cloud Centos VMs do not have any swap space. I added one GB swapfile, but was advised to enable about the same amount as we have RAM. I will come back and review this, also I was too lazy to add to boot so just do swapon every time?
#### Code free -m | grep Swap
Add swap file with this:
#### Code # Create an empty file called /swapfile (here over 2 GB is required for oracle but for just one GB it is count = 1024, need to come back and review) dd if=/dev/zero of=/swapfile bs=1024000 count=3000 #Format the new file to make it a swap file mkswap /swapfile #Enable the new swapfile. swapon /swapfile free -m | grep Swap
4.6.2 Persistent Net Rules Should Be Avoided On Centos
The current "CentOS 6.2 amd64" image (ID 21401), if launched and snapshotted, is not connectible when launched from this snapshot due to networking. (davidjb, Mon Mar 26, 2012 support.rc.nectar.org.au/forum).
################################################################ # name:netrulses # Just remove the /lib/udev/write_net_rules file (and # /etc/udev/rules.d/70-persistent-net.rules for good measure), and then # any further instances will always have their networking adapter as # eth0. There's probably a better way to do this, but that's working for # me now rm /lib/udev/write_net_rules rm /etc/udev/rules.d/70-persistent-net.rules
4.6.3 Disk Storage
Every VM on the Research Cloud has a 10GB primary disk which is used for the image you launch. The Primary disk is copied in a snapshot, so anything on this primary disk can be backed up via snapshots. In addition every Virtual Machine will get secondary storage. This secondary disk is not copied or backed up via snapshots. If you reboot your virtual machine, the secondary disk data remains in tact. If you shut down your VM, the data disappears (it is not persistent).
###################################################################### # Mount the drive to /home eg. mount -t ext3 /dev/vdb /home # df -h # once successful edit /etc/fstab to mount the drive at boot e.g. # /dev/vdb /home ext3 defaults 0 0 # Reboot the server to ensure that the drive mounts on boot. # check performance hdparm -tT /dev/vdb
- TODO find out recommended practice for backups, and purpose of 'object storage'
- http://support.rc.nectar.org.au/technical_guides/object_storage.html
- go to settings (topright), EC2 credentials, select project and download zip
- unzip and open ec2rc.sh in text editor
- sudo apt-get install python-boto
- list existing buckets
import boto.s3.connection
connection = boto.s3.connection.S3Connection( awsaccesskeyid='7b7a0a6f71994e42a07c8e9b0de0f8ca', awssecretaccesskey='2d54f6a679cd4b06820550459451cb50', port=8888, host='swift.rc.nectar.org.au', issecure=True, callingformat=boto.s3.connection.OrdinaryCallingFormat())
buckets = connection.getallbuckets() print buckets
- downside is you have to use python to work with the data?
5 The Brawn
The Brawn is a PostgreSQL Geographical Information System (GIS) Database server.
5.1 OpenGeo Suite (Postgres and friends)
5.1.1 Restricted OpenGeo Suite /opengeosuite-restricted.html
- The OpenGeo Suite Installer
--- name: opengeosuite-restricted layout: default title: Restricted OpenGeo Suite --- <li><a href="#sec-5-3-8">TODO Previous: </a></li> <li><a href="/opengeosuite-upgrade-tomcat6.html">Next: Disable Tomcat</a></li> <p></p> The OpenGeo Suite provides a complete spatial data server software stack. I use this for the batch install and configuration (TODO look into the individual components offered by OpenGeo). I don't want the Brawn server to run the tomcat etc. [Installation instructions are here](http://suite.opengeo.org/opengeo-docs/installation/linux/centos/suite.html) but beware that the Redhat version requires the optional channel to be enabled.
#### Code Centos # find out your os cat /etc/issue uname -r # if centos cd /etc/yum.repos.d wget http://yum.opengeo.org/suite/v3/centos/6/x86_64/OpenGeo.repo yum install opengeo-suite
#### Code Redhat cd /etc/yum.repos.d wget http://yum.opengeo.org/suite/v3/rhel/6/x86_64/OpenGeo.repo yum install opengeo-suite
- Upgrade tomcat6
#### Code service tomcat6 stop chkconfig tomcat6 off cd /usr/share wget http://mirror.overthewire.com.au/pub/apache/tomcat/tomcat-6/v6.0.37/bin/apache-tomcat-6.0.37.tar.gz tar -xzf apache-tomcat-6.0.37.tar.gz #To start Tomcat, go to the bin folder of Tomcat installation and then run the startup.sh script, cd /usr/share/apache-tomcat-6.0.37 vi conf/server.xml # change <Connector port="8081"/> from 8080 # use vi /etc/sysconfig/iptables to check this port is open in firewall vi conf/tomcat-users.xml #if you want to use the manager, I did not for security reasons #add a user with roles of admin,manager-gui like this #<role rolename="manager-gui"/> #<user username="tomcat" password="s3cret" roles="manager-gui"/> #To start Tomcat, go to the bin folder of Tomcat installation and then run the startup.sh script, ./bin/startup.sh ./bin/shutdown.sh # make a script for starting cd /etc/init.d vi tomcat # add this # to find java_home "update-alternatives --display java" and remove bin/java #### #!/bin/bash # description: Tomcat Start Stop Restart # processname: tomcat # chkconfig: 234 20 80 JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 export JAVA_HOME PATH=$JAVA_HOME/bin:$PATH export PATH CATALINA_HOME=/usr/share/apache-tomcat-6.0.37 case $1 in start) sh $CATALINA_HOME/bin/startup.sh ;; stop) sh $CATALINA_HOME/bin/shutdown.sh ;; restart) sh $CATALINA_HOME/bin/shutdown.sh sh $CATALINA_HOME/bin/startup.sh ;; esac exit 0 #### # now chmod 755 tomcat chkconfig --add tomcat chkconfig --level 234 tomcat on # verify chkconfig --list tomcat service tomcat start # test stop, start, restart more /usr/share/apache-tomcat-6.0.37/logs/catalina.out # to check for errors, but is all greek to me # so assuming ddiindex was previously running on tomcat6 service tomcat stop cd /usr/share/tomcat6/webapps cp -r dashboard /usr/share/apache-tomcat-6.0.37/webapps cp -r geowebcache /usr/share/apache-tomcat-6.0.37/webapps cp -r geoserver /usr/share/apache-tomcat-6.0.37/webapps cp -r opengeo-docs /usr/share/apache-tomcat-6.0.37/webapps cp -r geoexplorer /usr/share/apache-tomcat-6.0.37/webapps cp -r recipes /usr/share/apache-tomcat-6.0.37/webapps cd /usr/share/apache-tomcat-6.0.37 service tomcat start #now you have installed the suite go into the geoserver website # http://ip.address.of.server:8081/dashboard # then go to the geoserver and log in as admin with password geoserver # follow instructions there to enhance the security # for The default user/group service should use digest password encoding see # http://suite.opengeo.org/docs/geoserver/security/tutorials/digest/index.html # Configure the Digest authentication filter # # Start GeoServer and login to the web admin interface as the admin user. # Click the Authentication link located under the Security section of the navigation sidebar. # Scroll down to the Authentication Filters panel and click the Add new link. # Click the Digest link. # Fill in the fields of the settings form as follows: # Set Name to “digest” # Set User group service to “default” # Save. # Back on the authentication page scroll down to the Filter Chains panel. # Select “Default” from the Request type drop down. # Unselect the basic filter and select the digest filter. Position the the digest filter before the anonymous filter. # Save # # now go to users, groups, roles # under User Group Services, click default # under Password encryption, change to digest # save
- Secure tomcat
- We have a service called Nessus that reviewed the VM tomcat port
- found some minor issues
- Apache Tomcat servlet/JSP container default files
Synopsis The remote web server contains example files. Description Example JSPs and Servlets are installed in the remote Apache Tomcat servlet/JSP container. These files should be removed as they may help an attacker uncover information about the remote Tomcat install or host itself. Or they may themselves contain vulnerabilities such as cross-site scripting issues. Solution Review the files and delete those that are not needed. Risk Factor Medium CVSS Base Score 6.8 (CVSS2#AV:N/AC:M/Au:N/C:P/I:P/A:P) Plugin Information: Publication date: 2004/03/02, Modification date: 2012/01/20 Ports tcp/8081The following default files were found :
/examples/servlets/index.html /examples/jsp/snp/snoop.jsp /examples/jsp/index.html
Just remove this dir? cd usr/share/apache-tomcat-6.0.37/webapps rm -R -f examples
- CGI Generic Cross-Site Scripting (persistent, 3rd Pass)
A CGI application hosted on the remote web server is potentially prone to cross-site scripting attacks. Description The remote web server hosts one or more CGI scripts that fail to adequately sanitize request strings containing malicious JavaScript. By leveraging this issue, an attacker may be able to cause arbitrary HTML and script code to be executed in a user's browser within the security context of the affected site. This script identified patterns that were injected to test 'reflected' (aka 'non-persistent') XSS. The issues are likely to be 'persistent' (or 'stored') after all Solution Restrict access to the vulnerable application and contact the vendor for a patch or upgrade
5.2 Postgres standalone
5.2.1 PostgreSQL /postgresql.html
## Install PostgreSQL 9.2 PostgreSQL is an Open Source database that can be extended with GIS functionality using the PostGIS tools. The latest version is 9.2. Please see [this link](http://people.planetpostgresql.org/devrim/index.php?/archives/70-How-to-install-PostgreSQL-9.2-on-RHELCentOSScientific-Linux-5-and-6.html) for the orginial documentation I used to install this on Centos or Redhat 6.4. Check the correct download from [this link](http://yum.postgresql.org/repopackages.php#pg92). Please note that this didn't work on Ubuntu 12.04 LTS for me in early 2013 because PostGIS 2.0 was not included in their repositories. I ended up rolling back to PostgreSQL 9.1 on that machine.
#### Code # install the PostgreSQL 9.2 repo package rpm -ivh http://yum.postgresql.org/9.2/redhat/rhel-6-x86_64/pgdg-centos92-9.2-6.noarch.rpm # install PostgreSQL 9.2 with single command: yum groupinstall "PostgreSQL Database Server PGDG" # This will install PostgreSQL 9.2 server, along with -contrib subpackage. # Once it is installed, first initialize the cluster: service postgresql-9.2 initdb # Now, you can start PostgreSQL 9.2: service postgresql-9.2 start # If you want PostgreSQL 9.2 to start everytime on boot, run this: chkconfig postgresql-9.2 on
- Configure PostgreSQL connection settings
#### Code #edit your pg_hba.conf file under /var/lib/pgsql/9.2/data #I added a super user from my ip address and allowed all the local ip addresses access host all postgres my.desk.ip.address/32 md5 # if you want to allow data sharing on a specific database then create a public user # host dbname publicdata 0.0.0.0/0 md5 # if you want people to access from a subnet at your work # host dbname username ip.address.range.0/24 md5
#### Code #connect to psql #Set postgres Password su - postgres psql postgres postgres alter user postgres with password 'password'; select pg_reload_conf(); # close the psql using \q # change back to root exit
Now make the server listen for any connection requests from anywhere in the world.
#### Code # First locate the postgresql.conf file under /var/lib/pgsql/9.2/data. # uncomment and change from localhost # listen_addresses = '*' # then restart the server sudo service postgresql-9.2 restart #then reboot and confirm postgres started
- Allow connection to postgres through the firewall
#### Code: setting-ports vi /etc/sysconfig/iptables # and add the line -A INPUT -m state --state NEW -m tcp -p tcp --dport 5432 -j ACCEPT service iptables restart
5.2.2 PostGIS 2.0 /postgis.html
- Postgis
--- name: postgis layout: default title: PostGIS --- <li><a href="/postgresql.html">Previous: PostgreSQL</a></li> <li><a href="/postgres-migrate.html">Next: PostgreSQL Migrate</a></li> ## Install PostGIS 2.0 The PostGIS suite enables a PostgreSQL database with spatial data types and analysis functions.
#### References [http://www.davidghedini.com/pg/entry/postgis_2_0_on_centos]([http://www.davidghedini.com/pg/entry/postgis_2_0_on_centos]) [http://people.planetpostgresql.org/devrim/index.php?/archives/64-PostGIS-2.0.0,-RPMs-and-so..html](http://people.planetpostgresql.org/devrim/index.php?/archives/64-PostGIS-2.0.0,-RPMs-and-so..html) [http://people.planetpostgresql.org/devrim/index.php?/archives/65-Installing-PostGIS-2.0.X-on-RHELCentOSScientific-Linux-5-and-6-Fedora-That-is-easy!.html](http://people.planetpostgresql.org/devrim/index.php?/archives/65-Installing-PostGIS-2.0.X-on-RHELCentOSScientific-Linux-5-and-6-Fedora-That-is-easy!.html)
#### code: install-postgis2 yum list postgis* yum install postgis2_92.x86_64 yum install postgis2_92-devel.x86_64
- GDAL, PROJ and GEOS
#### Code sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis/x86_64/elgis-release-6-6_0.noarch.rpm sudo rpm -Uvh http://mirror.as24220.net/pub/epel/6/i386/epel-release-6-7.noarch.rpm # yum list gdal* yum install gdal-devel.x86_64 yum install proj-devel.x86_64 yum install proj-nad.x86_64 yum install proj-epsg.x86_64 yum install geos-devel.x86_64 #to update yum remove gdal* yum install gdal-devel.x86_64
- Create Database
#### Code: Create Database su - postgres createdb mydb psql -d mydb -U postgres CREATE EXTENSION postgis; CREATE EXTENSION postgis_topology;
- Create a GIS user and a group
#### Code: Create a GIS user and a group CREATE ROLE public_group; CREATE ROLE ivan_hanigan LOGIN PASSWORD 'password'; GRANT public_group TO ivan_hanigan; grant usage on schema public to public_group; GRANT select ON ALL TABLES IN SCHEMA public TO public_group; grant execute on all functions in schema public to public_group; grant select on all sequences in schema public to public_group; grant select on table geometry_columns to public_group; grant select on table spatial_ref_sys to public_group; grant select on table geography_columns to public_group; grant select on table raster_columns to public_group; grant select on table raster_overviews to public_group; \q exit
- Specific transformations grid for Australian projections AGD66 to GDA94
#### Additional Reprojection File A special transformations grid file is required to be added to the PROJ.4 files for reprojecting the Australian projections AGD66 to GDA94. Thanks to [Joe Guillaume](https://github.com/josephguillaume) and [Francis Markham](http://stackoverflow.com/users/103225/fmark) for providing this solution.
#### Code: transformations grid for Australian projections cd /usr/share/proj # the original was moved # wget http://www.icsm.gov.au/icsm/gda/gdatm/national66.zip wget http://www.icsm.gov.au/gda/gdatm/national66.zip # if it moves again a version of it is included with this repo # from your local scp to your server yum install unzip unzip national66.zip mv "A66 National (13.09.01).gsb" aust_national_agd66_13.09.01.gsb su - postgres psql -d mydb UPDATE spatial_ref_sys SET proj4text='+proj=longlat +ellps=aust_SA +nadgrids=aust_national_agd66_13.09.01.gsb +wktext' where srid=4202; \q exit
- Test transform
#### Code: test transformation from AGD66 to GDA94 select geocode, geoname, st_transform(geom, 4283) as the_geom into schema.gda94_table from schema.agd66_table;
#### FROM YOUR R SERVER OR LOCAL #### #R require(devtools) install_github("swishdbtools", "swish-climate-impact-assessment") require(swishdbtools) pwd <- getPassword(remote=F) ch <- connect2postgres2("django") #("ip.address", "dbname", "postgres", p = pwd) ## dbSendQuery(su, ## "UPDATE spatial_ref_sys SET ## proj4text='+proj=longlat +ellps=aust_SA +nadgrids=aust_national_agd66_13.09.01.gsb +wktext' ## where srid=4202; ## ") # now you can go ahead and convert AGD66 (4202) to GDA94 (4283) sql <- sql_subset_into(ch, "schema.agd66_table", select = 'geocode, geoname, st_transform(geom, 4283) as the_geom', into_schema = "schema", into_table = "gda94_table", eval = F, check = F) cat(sql)
5.2.3 PostgreSQL Migration /postgres-migrate.html
- Migrate the data
--- name: postgres-migrate layout: default title: PostgreSQL migration from default location --- <li><a href="/postgis.html">Previous: PostGIS</a></li> <li><a href="/sharedmemory.html">Next: Important shared memory settings</a></li> To take advantage of the extra storage on the secondary disk we mounted in the initial configuration then do the following.
#### Code: Migrate PostgreSQL Data service postgresql-9.2 stop # Copy the pgsql directory from /var/lib (or customer install directory) location to another location cp -r /var/lib/pgsql /home/pgsql chown -R postgres:postgres /home/pgsql # Edit the start script 'postgresql' vi /etc/init.d/postgresql-9.2 # Search for parameter PGDATA which would be entered as "PGDATA=/var/lib/pgsql" # Edit the line such that PGDATA points to the new location. For e.g. "PGDATA=/newloc/pgsql" # ALSO DO PGLOG, and PGUPLOG # Save and exit the file # Start PostgreSQL Service service postgresql-9.2 start # tidy up but not too much, just data? rm -r -f /var/lib/pgsql/9.2/data?
- Set up backups
From here to a secure location at my work.
5.3 Tuning Postgres Performance
5.3.1 Important shared memory settings /sharedmemory.html
--- name: sharedmemory layout: default title: Important Shared Memory Settings --- ## Managing memory settings The default settings in PostgreSQL are usually pretty good but these memory settings are conservative to start with and often need modifications. ## References [Kernal memory limitations](http://michael.otacoo.com/postgresql-2/take-care-of-kernel-memory-limitation-for-postgresql-shared-buffers/)
#### Code: sharedmemory vi /home/pgsql/9.2/data/postgresql.conf # shared_buffers # PostgreSQL has a default shared_buffers value at 32MB, what is # enough for small configurations but it is said that this # parameter should be set at 25% of the system’s RAM. This allows # your system to keep a good performance in parallel with the # database server. So in the case of a machine with 4GB of RAM, # you should set shared_buffers at 1GB. 2GB = 2048MB #also look at max_locks_per_transaction. tried setting to 1000??? ################################################################ # http://www.postgresql.org/docs/9.2/static/kernel-resources.html # Linux # The default maximum segment size is 32 MB, which is only # adequate for very small PostgreSQL installations. The # default maximum total size is 2097152 pages. A page is # almost always 4096 bytes except in unusual kernel # configurations with "huge pages" (use getconf PAGE_SIZE to # verify). That makes a default limit of 8 GB, which is often # enough, but not always. # The shared memory size settings can be changed via the # sysctl interface. For example, to allow 16 GB: sysctl -w kernel.shmmax=17179869184 sysctl -w kernel.shmall=4194304 # In addition these settings can be preserved between reboots # in the file /etc/sysctl.conf. Doing that is highly # recommended. # The remaining defaults are quite generously sized, and # usually do not require changes. also # http://www.linux.com/learn/tutorials/394523-configuring-postgresql-for-pretty-good-performance # work mem 4MB
5.3.2 TODO Postgres conf
I noticed when installing opengeo on RHEL and Ubuntu that the installer changes more parameters on the Ubuntu conf file. Compare these with diff and update RHEL to reflect the settings for ubuntu.
5.3.3 Test loading some shapefiles
on your ubuntu desktop install postgis and gdal (see above)
################################################################
sudo apt-get install postgis
then let's demo the Tasmanian SLAs:
- download the shapefiles
################################################################ # name:tassla06 # ABS spatial units are available at http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1259.0.30.0022006?OpenDocument setwd('..') dir.create('data') setwd('data') dir.create('abs_sla') setwd('abs_sla') download.file('http://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1259030002_sla06aaust_shape.zip&1259.0.30.002&Data%20Cubes&18E90A962EFD4D7ECA25795D00244F5A&0&2006&06.12.2011&Previous', 'SLA06.zip', mode = 'wb') unzip('SLA06.zip',junkpaths=T) sink('readme.txt') cat(paste('Australian Bureau of Statistics Statistical Local Areas 2006 downloaded on', Sys.Date(), ' from http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1259.0.30.0022006?OpenDocument') ) sink() # and load spatial data (sd) install.packages('rgdal') require(rgdal) sd <- readOGR('SLA06aAUST.shp', layer = 'SLA06aAUST') # might take a while head(sd@data) plot(sd) dev.off() save.image('aussd.Rdata') ###################### # tas sd2 <- sd[ sd@data$STATE_CODE == 6,] plot(sd2) axis(1);axis(2); box() # plot(sd, add = T) names(sd2@data) writeOGR(sd2,'tassla06.shp','tassla06','ESRI Shapefile') test <- readOGR(dsn = 'tassla06.shp', layer = 'tassla06') plot(test, col = 'grey') rm(sd) # save.image('tassd.Rdata')
- upload the shp2psql
# psql -d ewedb -U postgres -h 115.146.94.209 # CREATE SCHEMA abs_sla; # grant ALL on schema abs_sla to gislibrary; cd data shp2pgsql -s 4283 -D tassla06.shp public.tassla06 > tassla06.sql # psql -d ewedb -U gislibrary -W -h 115.146.94.209 -f tassla06.sql # warning terminal not fully functional? ran from normal terminal # now on the remote server run # psql ewedb postgres # GRANT select ON ALL TABLES IN SCHEMA public TO gislibrary;
6 The Brains
The Brains is a Statistical Analysis engine (running the R environment) integrated with a metadata registry.
6.1 R Server
6.1.1 R
#rpm -Uvh http://mirror.as24220.net/pub/epel/6/i386/epel-release-6-7.noarch.rpm rpm -Uvh http://mirror.overthewire.com.au/pub/epel/6/i386/epel-release-6-7.noarch.rpm yum install R R-devel
#rpm -Uvh #http://mirror.as24220.net/pub/epel/6/i386/epel-release-6-8.noarch.rpm To update R as ‘root’ on your system simply type
yum update R
6.1.2 TODO package management and R updates
Kudos2 http://zvfak.blogspot.com.au/2012/06/updating-r-but-keeping-your-installed.html The problem is that when you update R you usually need to re-install your libraries or change .libPaths() to point to a location that has your previous libraries.
The solution below will work for unix-like operating systems including Mac OS X.
########################################################################### # newnode: packageManagement #First, we need a location to install all our packages from now #on. This can be any directory, and location of this directory should #be indicated in ~/.Renviron file. Let's create that directory now: mkdir ~/Rlibs #We created Rlibs directory in our home directory. Now, create the #.Renviron file in your home directory and enter the following line #and save the .Renviron file: R_LIBS=~/Rlibs # We can now start R and install any library. The libraries will be # installed to ~/Rlibs, and when we update R, R will still look for # libraries in ~/Rlibs directory so we don't need to re-install the # libraries. However, we will need to update the libraries in ~/Rlibs # directory to their most recent versions. All we need to do is to run update.packages() # in R console, and the libraries will be updated.
6.1.3 Rstudio
check out the RSudio versions at: http://rstudio.org/download/server
# on RHEL6 ran into dependencies: libcrypto.so.6()(64bit) is needed by rstudio-server-0.96.331-1.x86_64 # libgfortran.so.1()(64bit) is needed by rstudio-server-0.96.331-1.x86_64 # libssl.so.6()(64bit) is needed by rstudio-server-0.96.331-1.x86_64 # http://www.linuxquestions.org/questions/linux-software-2/need-libcrypto-so-6-64bit-and-libssl-so-6-64bit-for-redhat-6-a-873068/ # yum list openssl098e* # yum install openssl098e.x86_64 # yum list compat-libgfortran* # yum install compat-libgfortran-41.x86_64 wget http://download2.rstudio.org/rstudio-server-0.97.551-x86_64.rpm sudo yum install --nogpgcheck rstudio-server-0.97.551-x86_64.rpm wget http://download2.rstudio.org/rstudio-server-rhel-0.99.451-x86_64.rpm sudo yum install --nogpgcheck rstudio-server-rhel-0.99.451-x86_64.rpm rstudio-server verify-installation
6.1.4 firewall access
# kudos2 http://slinsmeier.wordpress.com/2012/05/19/creating-a-lab-environment-with-rstudio/ # It is necessary to open the firewall port to allow the browser # access to RStudio: edit the vi /etc/sysconfig/iptables # file and add the line # -A INPUT -m state --state NEW -m tcp -p tcp --dport 8787 -j ACCEPT # directly after the opening of the ssh port 22 (or copy that line and change the port 22 to 8787). # reminder that you need to have the security group set up on the research cloud to allow tcp from 8787 to 8787 cidr 0.0.0.0/0 service iptables restart
Check that you can log on via port 8787. Note that this is an unsecure R server and the following steps are required to make this a secure server. We will need to block port 8787 later on.
6.1.5 SSL/HHTPS and running a proxy server
sudo yum install httpd.x86_64 # run the following interactively sudo openssl genrsa -out /etc/pki/tls/private/rstudio.ivan.com.key 1024 # the next is one line sudo openssl req -new -key /etc/pki/tls/private/rstudio.ivan.com.key -x509 -out /etc/pki/tls/certs/rstudio.ivan.com.crt -days 365 sudo yum install mod_ssl.x86_64
#vi /etc/httpd/conf.d/ssl.conf # Change the paths to match where the Key file is stored. SSLCertificateFile /etc/pki/tls/certs/rstudio.ivan.com.crt # Then set the correct path for the Certificate Key File a few lines below. SSLCertificateKeyFile /etc/pki/tls/private/rstudio.ivan.com.key mkdir /etc/httpd/sites # vi /etc/httpd/conf/httpd.conf # and add Include /etc/httpd/sites/ # as the last line. # vi /etc/httpd/sites/rstudio-ivan.com # insert <VirtualHost *:80> ServerName rstudio.ivan.com RedirectMatch ^(.*)$ https://rstudio.ivan.com$1 </VirtualHost> # goodo # vi /etc/httpd/conf.d/ssl.conf # add <Proxy *> Allow from localhost </Proxy> ProxyPass / http://localhost:8787/ ProxyPassReverse / http://localhost:8787/ # before </VirtualHost> /etc/init.d/httpd restart # weird error? ignore? # Stopping httpd: [60G[FAILED] # Starting httpd: httpd: apr_sockaddr_info_get() failed for i-00002979 # httpd: Could not reliably determine the server's fully qualified domain name, using 127.0.0.1 for ServerName # [60G[ OK ] sudo chkconfig httpd on # sudo vi /etc/sysconfig/iptables # to the previously added line for 8787 modify to # -A INPUT -m state --state NEW -m tcp -p tcp --dport 443 -j ACCEPT # change the research cloud firewall rules to reflect this change sudo service iptables restart # sudo vi /etc/rstudio/rserver.conf www-address=127.0.0.1 sudo /etc/init.d/rstudio-server restart # now going to https://your.new.ip.address/ # should ask you to add an exception # can also try sudo reboot?
6.1.6 git
yum install git # restart R studio
6.1.7 ssh for github
- in rstudio
- tools / options / version control
- create rsa key, ok, ok
- view pub key, copy, paste to your github account
6.1.8 gdal
sudo rpm -Uvh http://elgis.argeo.org/repos/6/elgis/x86_64/elgis-release-6-6_0.noarch.rpm # yum list gdal* yum install gdal-devel.x86_64 proj-devel.x86_64
6.1.9 geos
yum install geos-devel.x86_64
6.1.10 or under ubuntu
sudo apt-get update sudo apt-get install libgdal1-dev sudo apt-get install libproj-dev # OR # You need the development packages of GDAL and proj4. Probably easier to #install from repository than from source. Try: # sudo apt-get install libgdal1-dev libproj-dev # sudo R # install.packages("rgdal")
6.1.11 test readOGR
NB only possible once the PostGIS install is complete, see below.
################################################################ # name:readOGR2 readOGR2 <- function(hostip=NA,user=NA,db=NA, layer=NA, p = NA) { # NOTES # only works on Linux OS # returns uninformative error due to either bad connection or lack of record in geometry column table. can check if connection problem using a test connect? # TODO add a prompt for each connection arg if isna if (!require(rgdal)) install.packages('rgdal', repos='http://cran.csiro.au'); require(rgdal) if(is.na(p)){ pwd=readline('enter password (ctrl-L will clear the console after): ') } else { pwd <- p } shp <- readOGR(sprintf('PG:host=%s user=%s dbname=%s password=%s port=5432',hostip,user,db,pwd), layer=layer) # clean up rm(pwd) return(shp) } tassla06 <- readOGR2(hostip='115.146.95.82',user='gislibrary',db='ewedb', layer='tassla06') # func if (!require(rgdal)) install.packages('rgdal'); require(rgdal) if(!require(ggmap)) install.packages('ggmap'); require(ggmap) epsg <- make_EPSG() # load latlong <- read.table(tc <- textConnection( "ID POINT_Y POINT_X 1 150.5556 -35.09305 2 150.6851 -35.01535 3 150.6710 -35.06412 4 150.6534 -35.08666 "), header = TRUE); close(tc) # do for(i in 1:nrow(latlong)){ coords <- as.numeric(latlong[i,c('POINT_Y', 'POINT_X')]) e <- as.data.frame(cbind(i, t(coords), revgeocode(coords))) write.table(e, "test.csv", sep = ',', append = i > 1, col.names = i == 1, row.names = F) } d <- read.csv('test.csv') head(d) ## Treat data frame as spatial points pts <- SpatialPointsDataFrame(cbind(d$V2,d$V3),d, proj4string=CRS(epsg$prj4[epsg$code %in% '4283'])) writeOGR(pts, 'test.shp', 'test', driver='ESRI Shapefile')
6.1.12 rgraphviz
wget http://www.graphviz.org/graphviz-rhel.repo mv graphviz-rhel.repo /etc/yum.repos.d/ yum list available 'graphviz*' yum install 'graphviz*' # as root R source('http://bioconductor.org/biocLite.R') biocLite("Rgraphviz") q()
6.1.13 test
########################################################################### # newnode: test-rgraphviz try newnode_test from git@github.com:ivanhanigan/disentangle.git
6.1.14 install just the postgres bits required for RPostgreSQL package
########################################################################### # newnode: rpostgresql vi /etc/yum.repos.d/CentOS-Base.repo # append: exclude=postgresql* to [base] and [updates] sections curl -O http://yum.postgresql.org/9.1/redhat/rhel-6-x86_64/pgdg-centos91-9.1-4.noarch.rpm rpm -ivh pgdg-centos91-9.1-4.noarch.rpm # kudos2 http://www.davidghedini.com/pg/entry/install_postgresql_9_on_centos # Many, if not most, third party software and modules are still be set to look for PoistgreSQL's conf file and data directory under their old (pre-version 9) locations. # You can address this, and make life easier for yourself, by creating a few symlinks from the new locations to the old. # Symlink 1: Symlink for the binary directory. This is particularly useful as this is the location of the pg_config file # view plaincopy to clipboardprint? # so install the basic packages for a database # install a basic PostgreSQL 9.1 server: yum install postgresql91-server postgresql91 postgresql91-devel postgresql91-libs postgresql91-contrib # THIS LINE HERE ln -s /usr/pgsql-9.1/bin/pg_config /usr/bin # now check R install.packages('RPostgreSQL') # works?
6.1.15 postgis utilities
################################################################ # name:postgisutils yum list postgis2* yum install postgis2_91.x86_64 postgis2_91-devel.x86_64 # try /usr/pgsql-9.1/bin/raster2pgsql # to use this and psql via CMD line need to add using R su - user R sink('~/.pgpass') cat('hostname:port:database:username:password') sink() q() exit chmod 0600 /home/user/.pgpass # see http://www.postgresql.org/docs/current/static/libpq-pgpass.html # and other methods
6.1.16 TODO unixODBC
################################################################ # name:unixODBC yum list unixODBC* yum install unixODBC.x86_64 unixODBC-devel.x86_64 # unixODBC-kde.x86_64 R install.packages('RODBC') require(RODBC)
6.2 Test the Backups of this Minimal R Sever.
To test that this will restore successfully if there is a crash or the VM becomes corrupted take a snapshot from the ResearchCloud dashboard website and launch it to a new VM. Note that this will only restore the 10GB primary disc which has our software but not the user data which is on the secondary disc so try mounting that and check that the new VM will be a good replacement of the old one.
6.2.1 Backup the 2nd Disc
################################################################ # name:2ndDisc # on your secure local machine mkdir /projects/OpenSoftware-RestrictedData/Archives # on your remote server ssh root@ip.address.of.server cd / mkdir backups cd backups zip -r homebackup_yyyymmdd /home #scp homebackup_yyyymmdd.zip root@ip.address.of.server:/Archives #or use the R studio download file function # upload to the new R server exit scp /Archives/homebackup.zip root@ipaddress:/home # and unzip (TODO need to mv these up to the /home/ level after unzip) ssh root@ip.address.of.server cd home unzip homebackup.zip rm homebackup.zip
6.2.2 Launch from this snapshot and test the R server and 2nd Disc
6.2.3 TODO the permissions of the user on their home directory cause issues for logging in.
I've just used userdel -r username, then useradd etc and use their own user account to scp the backup files across. One solution is to not offer backup services, then users will have to undertake to load all data again from a fresh server if it failed. Data could be backed up on Brawn only?
6.3 Oracle XE Permissions and Users System
Installing Oracle XE, this is used as a data registry and I have a HTML-DB application available from XXXXXX to be used. It is called OraPUS because the Oracle XE license restricts any references to Oracle in the name of the application. There are other restrictions on this licence such as XXXXX.
6.3.1 backup local ubuntu version
I have a ubuntu pc as my main desktop and also backups to create oracle on this I needed a centos virtual Machines use the instructions at http://extr3metech.wordpress.com/2012/10/25/centos-6-3-installation-in-virtual-box-with-screenshots/ and also the comments ie: @Gunwant, One way for configuring the network, open your virtual box settings and go to the “Network” tab and choose the option “Attached to” to “Bridged Adapter”.
Then boot your centos virtual machine, and type the following in your terminal:
And then change the file to the following:
DEVICE=”eth0″ BOOTPROTO=”dhcp” ONBOOT=”yes
And save the file and exit. After that you need to restart the network service and you can do that by typing the following in the terminal:
Now, you can check your command
Now, it should work. I will be posting a detailed article soon for my readers soon. Thank you for visiting my blog. Feel free to ask any questions in the comments section. Have fun!
6.3.2 INIT
################################################################ # name:init yum update # SElinux config # vi /etc/selinux/config # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - No SELinux policy is loaded. SELINUX=enforcing # Change SELINUX=enforcing to disabled and you must reboot the server after applying the change. # vi /etc/hosts 127.0.0.1 localhost.localdomain localhost ADD-VMNAME
6.3.3 SWAP
MAKE SURE YOU HAVE 2GB SWAP, this is required for install but I have deleted it aftward and things still work http://bvirtual.nl/2009/08/add-more-swap-space-to-running-linux.html I SET TO 3000, didn't set to start at boot so have to swapon /swapfile
################################################################ # name:swap #Create an empty file called /swapfile from about 2GB dd if=/dev/zero of=/swapfile bs=1024000 count=3000 #Format the new file to make it a swap file mkswap /swapfile #Enable the new swapfile. Only the swapon command is needed, but with #the free command you can clearly see the swap space is made available #to the system. free -m | grep Swap swapon /swapfile free -m | grep Swap # TOO LAZY TO ADD TO BOOT SO JUST DO SWAPON EVERY TIME?
6.3.4 DOWNLOAD AND SCP
create free oracle account. download to local and then upload to server.
scp oracle-xe-11.2.0-1.0.x86_64.rpm.zip user@ip.address.of.server:/home/
6.3.5 Install the database
I followed these online guides: http://youtu.be/DRZg8tfmldA (this needs VNC or other) http://www.davidghedini.com/pg/entry/install_oracle_11g_xe_on
yum install libaio bc flex unzip cd /home mkdir OracleXE unzip oracle-xe-11.2.0-1.0.x86_64.rpm.zip -d OracleXE cd OracleXE/Disk1 rpm -ivh oracle-xe-11.2.0-1.0.x86_64.rpm /etc/init.d/oracle-xe configure # accept defaults #set up the etc/group file so that oracle is in dba (and oinstall? not there?) id -a oracle #log on as oracle and then su - oracle # set the environment cd /u01/app/oracle/product/11.2.0/xe/bin . ./oracle_env.sh # NOT DONE, which users needs this? which dot file? both? To set # the environment permanently for users, add the following to the # .bashrc or .bash_profile of the users you want to access the # environment: . /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh sqlplus /nolog connect sys/password as sysdba #To allow remote access to Oracle 11g XE GUI (as well as Application Express GUI) issue the following from SQL*Plus EXEC DBMS_XDB.SETLISTENERLOCALACCESS(FALSE); exit su - root # modify firewall # vi /etc/sysconfig/iptables # copy -A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT # twice and modify for 8080 and 1521 # restrict 1521 to your administration IP address # will also need the port 8080 enabled on the research cloud firewall service iptables restart
Now we can check the database service is running by pointing a browser at http://ip.address.of.server:8080/apex
Although Apex is already installed, we still need to set the Internal Admin password.
# To do so, run the apxchpwd.sql located under /u01/app/oracle/product/11.2.0/xe/apex: su - oracle . /u01/app/oracle/product/11.2.0/xe/bin/oracle_env.sh sqlplus /nolog connect sys/password as sysdba @/u01/app/oracle/product/11.2.0/xe/apex/apxchpwd.sql #Note: pick something simple like Password123! as you will be prompted to change it on first log in anyways. # access the Application Express GUI at: http://ip.address.of.server:8080/apex/f?p=4550:1
Now log on:
- Workspace: Internal
- User Name: admin
- Password: (whatever you selected above).
- log in, change password
- log in again
Now create Workspace:
- Workspace Name = DDIINDEXDB
- next page say existing schema "no" and write DDIINDEXDB into the schema Name
- schema password same as the user you want
- 100MB
- admin user IVAN
6.3.6 import application and set Security
import the application 102 then set authentication I followed the advice from the oracle documentation http://docs.oracle.com/cd/E17556_01/doc/user.40/e15517/sec.htm#BCGICEAH Changing the Authentication Scheme Associated with an Application. To change the authentication scheme for an application:
Navigate to the Authentication Schemes:
- On the Workspace home page, click the Application Builder icon.
- Select an application.
- On the Application home page, click Shared Components.
- The Shared Components page appears.
- Under Security, select Authentication Schemes.
- Click the Change Current tab at the top of the page.
- From Available Authentication Schemes, select a new authentication scheme and click Next.
- (Application Express is good)
- Click Make Current.
6.3.7 TODO explain the table creation script
now go to lib/setuporacleofdelphe.r to build the tables
6.3.8 set up R, RJDBC and ROracle
Using RJDBC is what I recommend. I found ROracle suboptimal.
su root #make sure env is set export LD_LIBRARY_PATH=/u01/app/oracle/product/11.2.0/xe/lib:$LD_LIBRARY_PATH R connect2oracle <- function(){ if(!require(RJDBC)) install.packages('RJDBC'); require(RJDBC) drv <- JDBC("oracle.jdbc.driver.OracleDriver", '/u01/app/oracle/product/11.2.0/xe/jdbc/lib/ojdbc6.jar') p <- readline('enter password: ') h <- readline('enter target ipaddres: ') d <- readline('enter database name: ') ch <- dbConnect(drv,paste("jdbc:oracle:thin:@",h,":1521",sep=''),d,p) return(ch) } ch <- connect2oracle() # OR #install.packages('ROracle')
6.3.9 TODO maintenance
6.3.10 unlock
6.3.11 unlock header
--- name: unlock layout: default title: unlock --- http://riaschissl.blogspot.com.au/2010/07/oracle-unexpire-and-unlock-accounts.html Some of our customers' applications are built around Oracle, so we have to fight the beast from time to time. Unfortunately, some of the surprizes the beast has to offer are quite random and rare, and due to this we tend to simply forget how we fixed and/or circumvented the issues previously. As usual, google is your friend and one of the most valuable resources on the net we've found is www.orafaq.com and most notably its security FAQs [1] So this is just an attempt of a small cheat sheat to help our overloaded brains :-) expired and locked accounts - the basics Now, as of version 11g, Oracle has enabled account expiration per default for many vital accounts (such as SYSMAN, SYS, ...). Quite a weird idea in my view, but who knows what hyper security things some wise engineer had in mind when doing so. Beware that we run Oracle under various Linux flavours, so things might be different for you. become database admin First log into your host running oracle and become the oracle user. #### Code:unlock sqlplus /nolog connect / as SYSDBA find out which accounts are expired select username, account_status from dba_users where ACCOUNT_STATUS LIKE '%EXPIRED%'; unexpire an account once an account has been expired, it can only be revived by assigning it a new password: #### code ALTER USER scott IDENTIFIED BY password; unlock an account ALTER USER scott ACCOUNT UNLOCK; disable default password expiry [2] this all depends on the profile a user belongs to, to disable password expiry for all users assigned the default user profile do this: ALTER PROFILE DEFAULT LIMIT PASSWORD_LIFE_TIME UNLIMITED;
6.3.12 extending/revising the database to track servers and files
Aim:
- the db is good for files, and even books etc.
- I think it might also be good for tracking server maintenance tasks
- I will experiment by modifying the names of the data entry fields ie from file to server
- and the tables stay the same. then I can also modify the app.
- change the names and fields
- go to page, edit page, remove fields one at a time
- keep the pages to the bare minimum
- go to edit page and change their names
##### stdydscr becomes Projects
- idno becomes projectid
- titl = projecttitle
- authenty = primaryinvestigator
- distrbtr = contactaddress
- abstract = businessrequirements
##### filedscr becomes servers
- filename = vmname
- Filelocation = ipaddress
- Filedscr = Description
- reqid = authorisedaccessgroupid
- notes = Server function
##### datadscrs becomes tasks
- LABL = task
- SAVE VERSIONS
export application to dropbox/Catalogue/BACKups
6.4 DDIindex
The DDIindex is a java based catalogue that was written by the Australian Data Archives at the ANU (Formerly the ASSDA http://assda.anu.edu.au/ddiindex.html). The original application source is still available from the ASSDA site but I am using the one I have archived because it has some configurations that I don't remember who to make again at the moment.
6.4.1 TOMCAT
http://coastalrocket.blogspot.com.au/2012/08/installing-geostack-on-centos-6-64bit.html
########################################################################### # newnode: install-tomcat yum -y install tomcat6 tomcat6-webapps # not tomcat6-admin-webapps vi /etc/tomcat6/tomcat-users.xml #add a user with roles of admin,manager like this <user name="tomcat" password="password" roles="admin,manager" />
########################################################################### # newnode: for-restarts chkconfig tomcat6 on
changing the Tomcat Port
- Locate server.xml in {Tomcat installation folder}\ conf \
- change <Connector port="8181"/> from 8080
- and connector executor too? only if it is not commented out (was on centos, not on RHEL)
- change iptables and restart
service tomcat6 restart
6.4.2 UPLOAD THE DDIINDEX
6.4.3 MySQL
http://www.if-not-true-then-false.com/2010/install-mysql-on-fedora-centos-red-hat-rhel/
rpm -Uvh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm yum --enablerepo=remi,remi-test list mysql mysql-server yum --enablerepo=remi,remi-test install mysql mysql-server /etc/init.d/mysqld start ## use restart after update ## OR ## service mysqld start ## use restart after update chkconfig --levels 235 mysqld on mysqladmin -u root password [your_password_here] mysql -h localhost -u root -p then create database, database user and database password ## CREATE DATABASE ## mysql> CREATE DATABASE security; ## CREATE USER ## mysql> CREATE USER 'ddiindex'@'localhost' IDENTIFIED BY 'BlahBlahBlah'; ## GRANT PERMISSIONS ## mysql> GRANT ALL ON security.* TO 'ddiindex'@'localhost'; ## FLUSH PRIVILEGES, Tell the server TO reload the GRANT TABLES ## mysql> FLUSH PRIVILEGES; # nd then use next at command line: mysql -h localhost -u root -p security < security_2010-04-27-1259.sql; # Enter password: # deprecated let thru iptables port 3306?
6.4.4 make sure ddiindex can connect to mysql as ddiindex user
# now go to /usr/share/tomcat/webapps/ddiindex/WEB-INF/classes/index-conf.xml # dburl change to <dbUrl>jdbc:mysql://localhost:3306/security</dbUrl> <dbUserName>ddiindex</dbUserName> <dbPassword>BlahBlahBlah</dbPassword> # other personalisations here such as <dataServer> mkdir /opt/ddiindex/ewedb mkdir /opt/ddiindex/ewedb_s
from home.jsp removed <%@taglib uri="http://assda.anu.edu.au/ddiindex" prefix="ddiindex"%> and login.jsp <%@taglib uri="http://assda.anu.edu.au/ddiindex" prefix="ddiindex"%> and search.jsp <%@taglib uri="http://assda.anu.edu.au/ddiindex" prefix="ddiindex"%>
6.4.5 TODO Check the security implications of allowing write permissions here
I found I need to allow write access to the xmldata folder for uploads of the xml metadata files.
adduser ddiindex chown -R ddiindex /xmldata chmod 0777 /xmldata # bad for security, try 0750 instead # chmod 0777 /opt/ddiindex/nceph # or this works chmod -R 777 /opt/ddiindex/nceph chmod 0777 /opt/ddiindex/nceph_s chmod -R 777 /opt/ddiindex/ewedb chmod 0777 /opt/ddiindex/ewedb_s
6.4.6 Running the Indexer
Log on as an admin user (set in the MySQL userEJB table) and you can run the indexer.jsp Before doing this the first time I did the following:
- went to nceph and ncephs deleted the recent created files
- went to xmldata, deleted all
- restart TOMCAT
- go to indexer, upload xml
- hit the 'add to index' button.
6.4.7 personalise the ddiindex
ie home.jsp <div class="intro">The Graduate Information Literacy Program at ANU now offers resources for researchers and graduate students on data management planning: a comprehensive Manual and a Data Management Planning Template. <a href="http://ilp.anu.edu.au/dm/" target="blank">http://ilp.anu.edu.au/dm</a></div>
6.5 Set up a private git server for data and code
6.6 True-Crypt encrypted volumes
The servers are secure but we will sometimes want local copies and so should use truecrypt or another option. I am using Ubuntu as my desktop so here is the instructions.
################################################################ #http://www.liberiangeek.net/2012/07/install-truecrypt-via-ppa-in-ubuntu-12-04-precise-pangolin/ sudo add-apt-repository ppa:michael-astrapi/ppa sudo apt-get update && sudo apt-get install truecrypt #### FAILED TO INSTALL ON REDHAT #### # or http://ubuntupop.blogspot.com.au/2012/10/how-to-install-truecrypt-on-centos-63.html # 1. Download Truecrypt package. Or use wget instead # wget http://www.truecrypt.org/download/truecrypt-7.1a-linux-x64.tar.gz # 2. Now extract the package # tar xzvf truecrypt-7.1a-linux-x64.tar.gz # 3. It will produce a file called truecrypt-7.1a-setup-x64 # chmod +x truecrypt-7.1a-setup-x64 # ./truecrypt-7.1a-setup-x64 # installer ran but then # truecrypt -h # truecrypt: error while loading shared libraries: libfuse.so.2: cannot open shared object file: No such file or directory # yum install libfuse.so.2 didn't help # DIDN'T TRY http://vinnn.blogspot.com.au/2009/01/tryecrypt-6-on-centos-5.html
6.7 TODO ResearchData storage
################################################################ # name:ResearchData cd /home mkdir ResearchData # and them to a particular group groupadd researchdata usermod -a -G researchdata username chown -R root:researchdata /home/ResearchData # TODO check what the security numeric codes do chmod -R 0777 /home/ResearchData chmod -R 0750 /home/ResearchData service rstudio-server restart # user may need to q() from R session and sign out of rstudio? # for a user to have this link in their home go su username ln -s /home/ResearchData /home/username/ResearchData
6.8 TODO example workflow to dev a local data package and publish to shared
- work in personal workspace then as root
[root@i-00002e43 ResearchData]# mkdir GARNAUTCLIMATECHANGEREVIEW [root@i-00002e43 ResearchData]# chown -R root:researchdata /home/ResearchData [root@i-00002e43 ResearchData]# chmod -R 0750 /home/ResearchData [root@i-00002e43 ResearchData]# ls -l [root@i-00002e43 ResearchData]# cd GARNAUTCLIMATECHANGEREVIEW/ [root@i-00002e43 GARNAUTCLIMATECHANGEREVIEW]# cp -R home/ivanhanigan/projects/GARNAUTCLIMATECHANGEREVIEW/* . [root@i-00002e43 GARNAUTCLIMATECHANGEREVIEW]# rm GARNAUTCLIMATECHANGEREVIEW.Rproj rm: remove regular file `GARNAUTCLIMATECHANGEREVIEW.Rproj'? y
6.9 TODO Share with windows network
7 Procedures for add users to the system.
7.1 Description of the access procedure
7.1.1 Getting Access
The "Getting Access" procedure to help users apply for and gain access to mortality data is shown in Figure 1, and a more detailed description of the process is shown in the Appendix, Figure 4. The process is a set of formally defined steps that are designed to move the User through two general stages:
- Requesting data: guiding the researcher through the process of (a) gaining Ethics Approval from a Human Research Ethics Committee, and (b) Project Level Approval from the Registrar of Births, Deaths and Marriages.
- Providing data: provision of a confidentialised (often aggregated) dataset in an appropriately secured manner such as access to remote secure servers or encrypted archives accessed on local disk media, with security determined by (a) the nature of the data and (b) any project management related criteria.
7.1.2 Managing Access
Procedures for managing access are shown in Figure 2 (and in the Appendix, Figure 5). These activities are intended to maintain information on the current state of projects using the mortality data, and to report any changes in situation to the State Registries. The User Administrator is responsible for conduct of the "Managing Access" procedures.
The process is initiated by running a query on the ANU-User-DB to make a list of all Projects and Users, and then each Project is sent a reminder to report any changes in Project Status (sent annually to coincide with a similar reminder sent by the ANU Human Research Ethics Committee). The purpose of these reminders is to ensure that Project management plans continue to consider data security as a primary concern, even during long multi-year projects where many project management and staffing issues inevitably arise.
Once a response is received, the User Administrator then enters the relevant information into the ANU-User-DB, and if the project has been concluded will then initiate the final "Ending Access" process.
7.1.3 Ending Access
The procedure for ending access aims to ensure that data are both securely and sustainably stored. It is very important that files used for authorised projects are never re-used in un-authorised projects, but that future researchers may have the opportunity to create an authorised project and potentially replicate historical analyses. This is an important part of reproducible research and the robust practice of scientific enquiry.
7.2 The process user administrators go through to set up users
When adding users go to the oraphi and track their permissions
7.2.1 lodge request in user db
- new study into nceph ddiindexdb
- new file with filelocation = brawns EWEDB schema, working directory
- in primary database record (not project) add access requestor, accessors add accessor, date to end
- finalise approval
7.2.2 r-nceph
production goes thru david fisher testbed done by me
7.3 brains
7.3.1 linux user
adduser userxxx passwd userxxx # (I use R to randomly generate passwords by mixing words, letters, # numbers and symbols)
7.3.2 mysql (check ddiindex)
mysql -u root -p security select * from UserEJB; insert into UserEJB (id, password, admin) values ('userxxx', 'passwd', 0);
- oracle (oraphi)
- log in as admin user to ddiindexdb workspace and go to administration,
- on the right there is a tasks pane with create user
- github
- downstream users can just download zips from github
- collaborators need to sign up to github
- will create a walk through for those
7.4 brawn
7.4.1 TODO postgres
Need to find out if this 0.0.0.0/0 is a big no-no?
# first edit your pg_hba.conf file # under /home/pgsql/9.2/data/pg_hba.conf I added a super user from my ip # address and allowed all the www ip addresses access host all postgres my.desk.ip.address/32 md5 host all gislibrary 0.0.0.0/0 md5 # and save. now su - postgres psql postgres postgres CREATE ROLE userxxx LOGIN PASSWORD 'xxxxx'; GRANT gislibrary TO userxxx; CREATE SCHEMA userxxx; grant ALL on schema userxxx to userxxx; GRANT ALL ON ALL TABLES IN SCHEMA userxxx TO userxxx; grant ALL on all functions in schema userxxx to userxxx; grant ALL on all sequences in schema userxxx to userxxx; # might want to set a date for expiry? now select pg_reload_conf(); # or if not connected just service postgresql-9.2 restart
7.4.2 geoserver
- general users don't need log in
- admin users can log on as admin?
7.4.3 alliance wiki
https://alliance.anu.edu.au/welcome/ uses their anu password
7.4.4 email details without password
Dear user,
The new EWEDB server is now ready. You can configure your Kepler environment by following the tutorial at http://swish-climate-impact-assessment.github.io/2013/05/hello-ewedb/
The server details are: database = ewedb host = ipaddress user = username
Your password will be sent to you as an SMS to your designated mobile phone number.
NB Please note that the connection information will be saved in plain text to your user profile. As this presents a security risk we request that you only use this service from a secure computer.
Thankyou, Ivan Hanigan Data Management Officer. National Centre for Epidemiology and Population Health. College of Medicine, Biology and Environment. Australian National University Canberra, ACT, 0200. Ph: +61 2 6125 7767. Fax: +61 2 6125 0740. Mob: 0428 265 976. CRICOS provider #00120C.
7.4.5 text message the set password
This solution is adequate. Might want to invest in a sacrificial SIM card to avoid having too many phone calls?
7.5 edits to oraphi via sql
update accessors set othermat = 'phone number' where accessornames = 'username';
7.5.1 revocation
userdel -r ivan_hanigan
8 Backups
8.1 General
8.1.1 maintenance
8.1.2 nearline for potential restore
8.1.3 archive and remove
8.2 General concerns
8.3 Brains
8.3.1 Backup oraphi
Needs to be done from Brains as it needs the oracle client set up.
########################################################################### # newnode: backupOraphi # using an emacs ssh buffer log on to target server, start R, start ESS-remote # do this as root and put into /backups if(!require(RJDBC)) install.packages('RJDBC'); require(RJDBC) drv <- JDBC("oracle.jdbc.driver.OracleDriver", '/u01/app/oracle/product/11.2.0/xe/jdbc/lib/ojdbc6.jar') ch <- dbConnect(drv,"jdbc:oracle:thin:@xx.yy.zz.ww:1521","DDIINDEXDB","password") dir() dir.create(paste('ddiindexdb-backups/csvs/',Sys.Date(),sep=''),recursive=T) for( tbl in c('STDYDSCR','DATADSCR','FILEDSCR','KEYWORDS','ACCESSORS','ACCESSSTDY','OTHRSTDYMAT')){ #tbl='STDYDSCR' dataout <- dbReadTable(ch, tbl) # sqlFetch(ch,tbl,as.is=T) write.table(dataout,paste('ddiindexdb-backups/csvs/',Sys.Date(),'/',tbl,'.csv',sep=''),na = "",row.names=F,sep=',',quote=T) } # download these to a backup location for storage.
8.3.2 sync the home directory
8.4 Brawn
8.4.1 Find out how big is it?
AKA brawn-dbsize
8.4.2 Dump and download it to a secure computer
- file system backup
- backup-brawn-filesystem header
- pgdump
Backup the pgdump to your workplace or another VM on the Research Cloud.First do this to setup the backup directory
mkdir /home/backup cd /home/backup /usr/pgsql-9.2/bin/pg_dump -U postgres ewedb | gzip > ewedb_$(date +"%F-%H%M").gz # see http://www.postgresql.org/docs/9.2/static/app-pgdump.html # NB tried -Fc compressed version but took too long to restore # see http://www.postgresql.org/docs/9.2/static/backup-dump.html # 24.1.3. Handling Large Databases
Optionally you could set up a scheduled cron job
cron -e # 15 past midnight everyday 15 0 * * * /home/ewedbbackup/backup_script.sh
Then add this script.
8.4.3 or simple home backup
#(set up passwordless ssh) #http://www.linuxproblem.org/art_9.html
crontab -e 15 0 * * * rsync -arvR –delete home root@my.server.ip:/ :wq
from a backup machine vi /etc/sysconfig/iptables als go to /etc/hosts.allow and allow this machine from the remote machine copy public key to the backup user’s home directory on the backup machine, like this: scp ~/.ssh/idrsa.pub root@115.146.92.162: copy public SSH key into the list of authorised keys for that user. #mkdir -p .ssh cat idrsa.pub >> .ssh/authorizedkeys
Then lock down the permissions on .ssh and its contents, since the SSH server is fussy about this.
$ chmod 700 .ssh $ chmod 400 .ssh/authorizedkeys
That’s it, passwordless-SSH is now set up: log out of the server then log back in:
Now set up the remote prod server to send it's home dir at a regular time to the backup server crontab -e 15 0 * * * rsync -arvR –delete home root@my.server.ip:/home/brainsbackup/ ESC :wq
8.4.4 Restore databse dump into a new database on another machine.
8.4.5 TODO launch a new Nectar VM from the snapshot image
8.4.6 TODO mount the 2nd disc and load/restore the postgres db and data into it
8.5 Disaster Recovery Plan
8.5.1 test a snapshot
should test snapshots 3-monthly?
- cleaning up links on github
in case of a completely failed VM, launching a fresh VM from snapshot will give a different IP-address and so Github and other links, and user's bookmarks will need to be revised.
8.5.2 TODO 60GB disk is not being saved in snapshots
Will also need to investigate the restore procedure (and security) for the secondary disc.
8.5.3 TODO Restore ORAPHI
This should have restored ok with data intact from the last snapshot. If not first set up oracle and the tables. Get the most recent backup of the tables. Load the tables to the new ORAPHI. In the event of an upgrade to a tested replacement you may find that the contents on the running VM that is getting retired is more current than the old stuff in the replacement tested server so will need to sync.