Databases

Number of engines

ASE creates an OS-level process for each engine you define. You can change the number of engines by using the following command:

sp_configure "engine", N

(replace N by the number of engines you want to configure).

Usually, if you server is almost exclusively used for ASE and you have X CPU cores available, you will want to configure X-1 engines. So assuming you have a dedicated server with 4 CPU cores, you’ll want to configure 3 engines.

You can change this setting also in the SYBASE.cfg:

[Processors]
max online engines = 3
number of engines at startup = 3

The first line defines that there will be 3 engines and the second one that all 3 engines will be started by default.

Even though in many cases, it makes sense to set the same value to both parameters so that you automatically use all available engines. You can also set the second one to a lower value and benchmark the system with less engines and then bring one additional engine online after another.

Increasing the max number of online engines to higher number than the number of available logical CPU’s makes no sense. So I’d always recommend setting it to the total number of logical CPU’s or this number minus 1. Whether you bring them all online at startup or not depends on what else is running on the system and the specific workload you have on this server.

If you configure too many ASE engines for the underlying CPU’s, you will observe some significant loss of throughput. It is due to the high number of involuntary context switches.

Hyper-Threading

Hyper-Threading creates “virtual CPU’s”. So an application running on a system where Hyper-Threading is enabled will think that there are twice as many CPUs as physically available. ASE will not make any difference between virtual CPU’s and real CPU’s.

Although Hyper-Threading provides the ability run two ASE engines for one physical processor, you need to keep in mind that it is still not equivalent to running two engines with two physical processors.

In many cases, you should consider switching off Hyper-Threading. Except if you actually only have very few physical CPU cores available, HT will probably not bring you the expected benefits. You might run into problems because ASE doesn’t see that two CPU’s are basically running on the same physical CPU and it should rather distribute the load between physical CPU’s instead of distributing two queries to the 2 CPU’s running on same physical CPU. Also ASE could schedule queries run at the same time to run only on the virtual CPU’s while it would be better to run them on the real CPU’s (although theoretically, there should be no difference in performance between a virtual CPU and a real one).

But keep in mind that whether HT will bring performance benefits or on the contrary make you system slower really depends on the system itself. It highly depends on your actual hardware and workload. So benchmarking it on the specific system might still be a good idea.

ASE 15.7 comes with a threaded kernel. It takes advantage of threaded CPU architectures. It can thus reduce context switching to threads instead of processes, which brings a performance boost. But this is not related to Hyper-Threading.

Using the default kernel for ASE 15.7, each engine is a thread which lives in a thread pool instead of being an OS process (which was already the case for ASE on Windows even before ASE 15.7).

CPU Affinity

The processes for the different ASE engines have by default no affinity to the physical or virtual processors. Usually, it is not required to force any CPU affinity as ASE will handle it properly.

Here you see that there are 4 CPUs and 3 engines running using CPU0, CPU1 and CPU3. You can also press “f”, “j” and return in top to have an additional column displayed which will explicitely show which engine is using which CPU:

The column “p” shows that the 3 dataserver processes use the CPUs 0,1 and 3.

In some cases (not sure when or why this happens), you will see that all dataserver processes will be using the same CPU even though they are processing different queries. Since multiple tasks have to be handled by the same CPU, this will make each task slower and also cause alot of overhead due to task switching.

If this happens, you can use the a “dbcc tune” command to configure a CPU affinity. This can be done by using the following command:

dbcc tune(cpuaffinity, -1, "on")

The -1 parameter is the start CPU. This one will always be skipped. So setting it to -1 means that:

The first engine will be bound to CPU0

The second one to CPU1

The third one to CPU2

If you want to keep CPU0 for other processes, you’d use:

dbcc tune(cpuaffinity, 0, "on")

This will do the following:

The first engine will be bound to CPU1

The second one to CPU2

The third one to CPU3

After that you should see that all dataserver processes are using different CPUs.

Note: The setting will be active only until the ASE server is restarted. So the dbcc tune command must be reissued each time ASE is restarted.

Also note that some operating systems do not support CPU affinity. In this case, the dbcc tune command will be silently ignored.

Even if the table is sorted, iterating through the rows and keeping track of the last SP1 you’ve seen will not help you since you cannot delete the second row because you do not have anything to identify it (like ROW_COUNT in Oracle).

One way to handle it is getting a list of unique SP1 values and their row count:

SELECT SP1, count(*) as rcount FROM benohead GROUP BY SP1

This will return something like this:

SP1 rcount
----------- -----------
1 3
2 1
3 2
4 1

You can then iterate through this and for each value of SP1 set a rowcount to rcount-1 and delete entries with that SP1 value. In the end, you’ll have one row per SP1 values. Of course, if you just need the data and do not want to actually clean up the table, you’ll have to do it on a copy of the table.

Instead of deleting, you can also iterate through the values of SP1 and fetch the top 1 row for this value:

SELECT TOP 1 SP1, SP2, SP3 FROM benohead WHERE SP1=1

If you had only one additional column (e.g. SP2), it’d be even easier, since you could just use MAX and GROUP BY:

SELECT SP1, MAX(SP2) AS SP2 FROM benohead GROUP BY SP1

which returns:

SP1 SP2
----------- -----------
1 6
2 3
3 7
4 1

Unfortunately this doesn’t scale to multiple columns. If you also have SP3, you cannot use MAX twice since you will then combinations which didn’t exist in the original table. Let’s insert an additional row:

INSERT INTO benohead VALUES(1,1,9)

The following statement:

SELECT SP1, MAX(SP2) AS SP2, MAX(SP3) AS SP3 FROM benohead GROUP BY SP1

I’ve written this cookbook about 10 months ago. It basically contains all the information about Sybase ASE I’ve documented on my blog over the years. I use it when I need offline access to some tricks (since I’m not getting younger, it’s sometimes useful to have a kind of brain dump somewhere). I also compiled it and published it here in the hope that someone else might find it useful.

I’ve just updated the cookbook with a few new things. But it’s really a small update.

This cookbook is still available for free. And I am still no professional writer and still cannot afford paying someone for proof-read it. So if you notice any mistakes, explanations which cannot be understood or anything like this, please leave a comment here or contact me at henri.benoit@gmail.com. I can’t guarantee how fast I can fix mistakes but I’ll do my best to do it in a timely manner.

The first column returned by this statement contains the table name (if you have names longer than 30 characters, you should replace 30 by something higher), the number of rows, the number of data pages, the size in kilobytes.

If you have an ASE version older than 15, the statement above will not work but you can use the statement below instead:

I’m currently evaluating which NoSQL we could use in a new project and thought I’d document the considered options are and the relevant criteria here.

First let’s see what a NoSQL database is at all. NoSQL doesn’t necessarily stand for “No SQL” but rather for “Not only SQL”. So this means it’s a database which can be worked with without using SQL but it doesn’t mean that none of them actually supports an SQL syntax. But the goal is of course not to get rid of SQL but rather to support use cases which are currently not well supported by classical relational database management system (e.g. Oracle, Sybase ASE, MS SQL Server, MySQL…).

Shortcomings of relational databases

So what are the main shortcomings of relational databases:

Effort to setup and maintain.

Scalability.

Performance.

Effort to setup and maintain

In a relational database, all entities stored in the database must be defined with a schema known in advance. The relationships between entities have to be modeled. Whenever you need to be able to store new types of data or whenever you need to store additional attributes, you need to update your schema and make sure that existing data are made compatible to the new schema.

So the effort to setup comes from the fact that to park a car in there, you need to disassemble the car to be able to store it in the garage. This means that the whole structure of the data stored in the database needs to be known in order to create a schema which is required in order to store data. The only way to be able to store different types of data not known in advance is to store them as BLOBs which makes it impossible to use any of the advantages of relational databases later on.

The effort to maintain comes from the fact that if you want to start parking trucks additionally to cars, you need to figure out what are the parts making up a truck and what’s common between cars and trucks before you can store trucks.

Scalability

The main scalability issue with relational databases is that in order to keep data integrity and support transactions, a relational database needs to handle transactions and synchronize writes across multiple related entities which if stored on different servers requires much more effort especially to handle deadlocks. Making sure that all data related to a single entity on a single machine becomes increasingly complex. The synchronization costs tend to quickly increase as the complexity of stored data increase.

A big issue regarding scalability arises from the fact that relational databases usually require much more expensive hardware in order to scale. Scaling a relational database with commodity hardware becomes a very difficult task because of the need to support a global lock manager and distributed synchronized writes.

So basically relational database scale very well on a single server but the problem arise when you need to scale it beyond the single server deployment.

Performance

Since the car was split into individual parts in order to park it in the garage, retrieving the car from the garage means reassembling it from its parts. This is where the performance issue of relational databases comes from. Whenever you need to retrieve an entity and related data, it becomes less efficient if they are stored separately. If you stored them all together, you’d be able to retrieve the whole car much faster.

Since a relational database also needs to make sure that the integrity of the model is maintained and ensure atomicity when storing the different parts of the car, storing the car in the garage is slower than if you just stored the whole car at once.

Both points above are of course also related to the scalability issue since the cost to store and retrieve is increased even more when the different parts are physically stored on different machines.

Different types of NoSQL databases

In order to overcome the shortcoming of relational databases in some scenario, different types of NoSQL database came to life. There are basically 4 big groups of NoSQL databases:

Column-oriented databases.

Key-Value stores.

Graph databases.

Document databases.

In the sections below, I’ve listed the characteristics of the different NoSQL database and database products in each category. I’ve only considered products matching the following criteria:

License: Open Source

Supports disk storage

Deployable on Linux, Mac OS X and Windows

Deployable on an own server

The OS requirement is important for me since the final deployment will be on a Linux server (or servers) but development will be done on Mac and Windows. It is also important that the database software doesn’t put unnecessary restrictions on the operating system we’ll use in development and deploy it on in the end.

We also plan to deploy the solution on our own servers so databases which can only be used in combination with a specific cloud offering are not considered.

Column-oriented databases

A column-oriented databases stores data tables as a set of columns rather than a set of rows. They are mostly used for data warehouses and CRM systems where it’s important to be able to aggregate data over large numbers of similar data items.

Cassandra by the Apache Software Foundation seems to be the only database product in this category which seems to match the above criteria. It is released under the Apache license.

Key-Value stores

They data are stored by the application in a schema-less way. It’s value is associated to a key which uniquely identifies it.

Unfortunately, although there are many key-value stores on the market, I couldn’t find a single one matching all the criteria above. If you happen to know of such a database please let me know in the comments and I’ll update this post.

Graph databases

They are able to store elements interconnected with an undetermined number of relations between them. They are mostly appropriate for modelling social relationships, maps and transportation.

I could find two databases which match the above criteria.

The first one is Neo4j by Neo Technology. It is released under the GPL. One disadvantage of Neo4j is about scalability. It doesn’t seem to be part of the main design of Neo4j though Neo4j Enterprise seems to have some support for replication allowing performing a online backup. Without Neo4j you will need to shutdown the database and copy the database files. Neo4j also seems to lack partitioning support.

Both products support ACID transactions. Only OrientDB supports partitioning. OrientDB supports replication out of the box. Also the license of OrientDB is more developer friendly. So if you do need a graph database but do not have time to evaluate both products, I’d recommend checking OrientDB.

Document databases

Document databases (also called document stores) store documents encoding data using e.g. XML, YAML, and JSON/BSON (or also as PDF or Microsoft Office files). The documents are stored as collections of documents. These collections are similar to tables in relational databases (the documents being the records in those tables) but unlike relational databases, the document in these collections do not need to have the same schema. They can actually have completely different fields.

Each document in the database has a unique key used to identify it. But unlike key-value stores, document stores also provide functionality to retrieve documents based on their contents (even though all documents do not have the same attributes/fields).

There are quite a few document databases matching my above requirements.

It is released under the AGPL. It’s the most well-known document database on the market. It is used by Craigslist, Foursquare and Shutterfly.
It comes with a lot of functionality:

predefined datatypes

indexes

JavaScript server-side scripting

partitioning

master-slave replication

MapReduce

eventual and immediate consistency

atomic operations within one document

MongoDB supports ad-hoc queries pretty well and its query tools support a lot of what can be done in SQL (of course with the exception of joins). So if you have experience working with an SQL relational database, you should be able to get used to it pretty quickly.

It is released (as expected) under the Apache license. It’s a document store inspired by Lotus Notes. It is used by quite a few organizations no big names like MongoDB or Couchbase.

Compared to MongoDB, it does support a few more operating systems (e.g. Android and BSD). But it does not support the following:

predefined datatypes

immediate consistency

But it does support the following which is not supported by MongoDB:

triggers

master-master replication

CouchDB is a single node solution with peer-to-peer replication technology and is better suited for decentralized systems. So if you do not need immediate consistency and need master-master replication to build a decentralized system, CouchDB might be a better fit than MongoDB.

It is a JSON-based document store derived from CouchDB with a Memcached-compatible interface and is released under an Apache license. It is used by many companies including Adidas, Adobe, Aol, BMW, Cisco, Ebay, Intel, Mozilla, Nokia, Vodafone and Zynga.

Compared to MongoDB it doesn’t support deployment on Solaris and also lacks predefined datatypes.

Being based on CouchDB, it also supports triggers and master-master replication but also supports immediate consistency like MongoDB (and which isn’t supported by CouchDB).

Couchbase additionally has a built-in clustering system and can spread data automatically across multiple nodes.

Also since Couchbase provides built-in Memcached-based caching, it is usually better suited for use cases where low latency or high throughput is a requirement.

If easy scalability and high throughput are important to you but you do not want to sacrifice immediate consistency, then Couchbase might be the right solution for you.

But you should keep in mind that Couchbase is not entirely open-source. There are two versions: Community Edition (free but no latest bug fixes) and Enterprise Edition (with additional restrictions). If you plan to use the Enterprise Edition, you should carefully read the license terms.

ArangoDB is released under an Apache license. It supports both disk and RAM-based storage of JSON data.

Compared to MongoDB, it does not provide support for MapReduce but supports the ArangoDB query language which allows using aggregation, graph queries, grouping, joins, list iteration, results filtering, results projection, sorting and variables. It also supports ACID transactions.

An advantage of ArangoDB is that it supports database models based on graphs, key-values and documents.

Summary

Since our goal is to store multiple XML document types which might have different schemas and need to be able to generate reports based on their contents, our obvious choice is to go for a document oriented database. This leaves us with 4 database products to choose from:

MongoDB

CouchDB

Couchbase

ArangoDB

Since not all non-functional requirements are yet available, we need to try and define sets of possible requirements and for each of them define the corresponding product which would be our favorite.

First, ArangoDB seems to be a very good product from it’s supported functionality and architecture. But it’s quite a new product (initial release in 2011) and doesn’t have the same kind of user community the other 3 products have. Looking at Google Trend, you will also get the following figures for March 2014:

MongoDB: 100

CouchDB: 8

Couchbase: 7

ArangoDB: 0

Of course, if you read this article in a few years from now, the situation will most probably be different. Looking into my crystal ball, I’d say that in a year or two from now, ArangoDB will be up a little bit, interest for CouchDB will further move to Couchbase and MongoDB will still be number one but by not as much as now.

Also since scalability is very important, we would rather tend to use Couchbase than CouchDB. Of course we need to further analyze the differences between the Community and Enterprise editions and also check the exact terms of the license for the enterprise edition.

The only two thing speaking against MongoDB seem to be:

The AGPL license which I’ve always found scary.

Scaling with MongoDB seems to be more complex than with Couchbase.

Right now, I am not 100% sure whether to go for Couchbase or MongoDB. Both seem to meet all our requirements and we probably need to give them both a try and see which one is the perfect fit.

First let’s have a look at what SQL injection is about. SQL injection means that an attacker is injecting some pieces of SQL code in a call to a server instead of just sending some text information in order to go around security mechanisms or in order to perform something which shouldn’t be allowed.

Here’s a very simple example. Let’s say you have a very poorly programmed login function which is called with two parameter, a user name and a password. If you take the parameters and built an SQL statement like this:

$query = "SELECT 1 FROM users WHERE user_id='".$user_name."' AND password='".$password"'"

An attacker may send the following:

User name: admin' —

Password: anything

The generated SQL query would be:

SELECT 1 FROM users WHERE user_id='admin' --' AND password='anything'

The double dash would make the rest of the line a comment and the statement would always return 1 allowing the attacker to login as admin without valid credentials.

An easy fix for this security issue is not to return 1 but to return an MD5 of the password and compare it with the password provided. Unfortunately, it is also trivial to workaround such security fixes. Let’s say you statement now looks like this:

SELECT password FROM user WHERE user_name='xxx'

All the attacker has to do is to use the following username: admin' AND 1=0 UNION SELECT 'known_md5_checksum

But SQL injection is not only used to be able to login without credential. It can be used to perform actions which are not intended to be allowed. This is typically done by using batched queries i.e. closing the first query and having a second query executed which would either return sensitive information or destroying something e.g. using a user name like: admin'; DROP TABLE important_table —

The first statement will be executed normally and the drop table will then be executed additionally. Fortunately, when using PHP and MySQL these kind of batched queries are not supported. The execution will fail since you can only have one statement executed at a time. But it you used PostgreSQL instead of MySQL it would be possible.

Another thing which is often done using SQL injection is getting access to data in the database which should be protected. This is usually done using a kind of UNION injection. The idea behind it is that:

the server you are attacking is fetching data using a query and displaying this data in a tabular form

you inject a UNION clause to fetch data from another table

the data from both table are displayed in the table on the client

Let’s say you have an order table and you can call the server to display all items in a particular order with such a statement:

If the attacker now sends the following as order_id, he will get a list of all users and their passwords: 1 UNION SELECT name, password FROM users

I do hope the passwords will be at least encrypted but encryption alone is not always enough to protect data. You also need to make sure the encrypted data cannot be accessed that easily.

Of course an attacker will need to know which kind of statements are executed by your software in order to exploit such a security hole. But it might not be as difficult as you think… I’ll post another article about how you can manage to get information about the query being executed later.

Many database engines also provide the functionality to execute commands in the operating system from SQL. The commands are executed using the user running the database engine. It’s sometimes very useful but in an SQL injection scenario it may allow an attacker to not only steal information or damage the database but also the operating system. Fortunately, in our case the xp_cmdshell command used to do this (MSSQL Server and Sybase) does not exist in MySQL.

So what is to be done to protect yourself against SQL injection attacks ?

First if you use MSSQL Server or Sybase: to prevent an attacker from destroying the whole server or prevent him from getting access to any file on the computer, you should disable xp_cmdshell or run the database engine with a user with very limited access rights to the rest of the system.

Now, let’s get back to MySQL and PHP. Here you should use Use prepared statements and parameterized queries using PDO or Mysqli. These statements are sent to the database engine and are parsed independently of anything else. Like this it is not possible to inject SQL code in a parameter.

Note that PDO is an extension but it is bundled by default since PHP 5.1 and a MySQL driver is also available by default.

This is basically the best way to secure your software. In case you cannot use PDO or Mysqli, there are other techniques to prevent SQL injection attacks. I’ll list them in an update to this post in a few days.

Sybase ASE supports both the old syntax and the newer SQL-92 syntax for left outer joins:

Old syntax:

SELECT * FROM table1, table2 WHERE table1.key*=table2.fkey

New syntax:

SELECT * FROM table1 LEFT JOIN table2 ON table1.key=table2.fkey

As long as you do not have other criteria, the results will be the same. But you might experience some differing results as soon as you add some other criteria e.g. the two following statements seem to do the same but do deliver different results:

The reason is that the database engines when executing the first statement does not only consider p.p_key=e.p_key as join criterion but also e.field1='V1'. So basically the first statement is equivalent to the following SQL-92 statement:

Note that the second criterion is not in the where clause but in the on part.

So the old left outer join syntax is compacter but it is ambiguous as it doesn’t properly separate the join criteria and the where criteria. In case of a left outer join it makes a huge difference since the join criteria do not filter the returned rows but the where criteria do.

In most cases, the results you were after are the ones returned by the first and last queries above. But you should avoid the old left outer join syntax and try to use the SQL-92 syntax everywhere. It makes it clearer what you mean with the statement and can save some time searching why you did not get the output you were expecting. But also with the SQL-92 syntax you should carefully think whether you want to add a criterion to the join criteria or to the where clause (and as stated above in most cases when using a left outer join, the criteria on the joined tables should probably go in the join criteria).

I’ve compiled in this short ebook all posts I’ve ever written regarding Sybase ASE and tried to cluster them a little bit. This ebook is full of information useful when you work daily or once in a while with Sybase ASE. Sybase ASE is a high-performance and scalable database management system but it is also complex and not always easy to understand.

After working for over 10 years with ASE, I’ve gathered a lot of information I share on a daily basis with colleagues and once in a while on my blog at http://benohead.com.

I provide this ebook in the hope to be helpful. It is thus available for free. Since I’m not a professional writer, I do not have a crew of people proof-reading it. So spelling might be not as good as it should be and I can’t be sure that everything in there is 100% accurate. If you find any mistake, please contact me at henri.benoit@gmail.com and I’ll do my best to update it.

In order to peek into the procedure cache, you can use the following dbcc command:

dbcc procbuf

In order to see the output on the console, use:

dbcc traceon(3604)
go
dbcc procbuf
go

You’ll see that the output is pretty extensive. If what you are after is which trigger and procedures are using space in the procedure cache and how much space it uses, you only are interested in the lines like:

Archive databases are used to access data from a backup file directly without having the restore the database. Let’s say you lost some data in a table but had many other changes to other tables since the last backup. Just loading the last backup is not an option since you’d lose everything since the last backup. Of course, if you work with transaction log dumps, you can reduce the loss of data but very often it’s still too much. Additionally, in some cases you know the data you want to reload have not changed since the last backup (i.e. some kind of master data). So the best solution would be to be able to keep the current database but just reload this one table. Or maybe you do not want to reload a complete table but just copy a few deleted lines back in a table.

That’s exactly what an archive database is for. You cannot dump an archive database. An archive database is just a normal database dump loaded in a special way so that you can access the data without having to do a regular load of the dump which would overwrite everything.

So what do you need in order to mount a database as an archive database. Well, you need two additional databases:

A “scratch database”

An archive database

The “scratch database” is a small database you need to store a system table called sysaltusages. This table maps the database dump files you are loading to the archive database.

The archive database is an additional database you need to store “modified pages”. Modified pages are pages which are created additionally to the pages stored in the dump files. These are e.g. the result of a recovery performed after loading the database dump. So this database is typically much smaller than the dump files you are loading. But it is difficult to tell upfront how big it will be.

So once you have loaded an archive database, the data you see come from these three sources:

The loaded dump files

The scratch database

The archive database

So let’s first create the two database (I assume here you have some devices available to create these databases).

Replace scratch_data_dev, scratch_log_dev and archive_data_dev by the names of the devices you want to create the data and log fragments of the scratch database and the data fragment of the archive database.

Now we’re ready to load the dump. Just do it the way you would load the database to restore it but only load it to the just created archive database e.g.:

load database archivedb from '...'

You can of course also load the transaction logs with:

load transaction archivedb from '...'

Note that while loading the database dump or the transaction log dumps, you might get error message saying that either the transaction log of the scratch database or the modified pages section of the archive database run full e.g.:

There is no more space in the modified pages section for the archive database ‘pdir_archive_db’. Use the ALTER DATABASE command to increase the amount of space available to the database.

If you do not care about the recovery and have limited storage available for the archive database you can use:

load database archivedb from '...' with norecovery

Loading with norecovery also reduces the time required to load. Also the database is automatically brought online (this also means you cannot load additional transaction logs). The downside is that the database might be inconsistent (from a physical and transactional point of view).

If you did not use the norecovery option, you have to bring the archive database online:

online database archivedb

Once you are done with the archive database and do not need it anymore, you can just drop both databases:

We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners.OkRead more