Explorations in manipulation

Monday, 22 January 2018

Recently, I picked up Hanyu Da Cidian in .DSL (ABBYY Lingvo) format somewhere, probably on Russian Interwebs. I already bought a Pleco version of this great dictionary a while ago, so I didn't care much in figuring out its copyright status. Anyhow, since I do not use Lingvo at all, I converted it to OSX native Dictionary format.

Being an old-school nerd (information still wants to be free, no matter how successful the corpos were in convincing millenials otherwise), I don't mind sharing it. Besides, I'll be extremely flattered if this causes some Chinese corpo send Google a DMCA takedown request!

Unpack the ZIP, put the resulting .dictionary package in ~/Library/Dictionaries/ or wherever your custom dictionaries are kept, enable in Dictionary preferences.Note it is in simplified, not traditional, characters.

Sunday, 19 November 2017

An infrequent progress report. These days I could probably pass HSK5 and am thinking whether it's even worth continuing doing these HSK tests. They are so geared towards students planning to study and get their first job in China (har har) that they are not very useful in assessing overall language proficiency. I've heard good things about TOCFL, yet again it is likely very much geared towards future students in Taiwan. We will see.

After finishing Integrated Chinese I moved on to Reading Into a New China and have just started the second volume. With a bit of ChinesePod sprinkled in, my least developed area is probably writing. Language production is always less developed than comprehension, and wrt speaking - I don't have that much of a desire to develop eloquency in spoken Chinese, at least yet! Plus a few hours a week online with a tutor, who doesn't focus on speaking either, but we talk nearly exclusively in Mandarin, sort of help.

I'm thinking what to do about writing, other than the obvious - write a lot. The trouble is, the stuff I can write is rather limited in vocabulary (according to Anki, my passive vocabulary is just under 4k words and 2k characters) and will end up being like a first grader diary :D With a lot of mistakes, naturally.

Perhaps sticking to reading as much as I can will be the best for the next (half?)year. While flipping through talks by Steve Kauffman I saw him mentioning a few nice, though old, readers e.g. Intermediate Reader in Modern Chinese and Twenty Lectures on Chinese Culture. In traditional characters (I'm not looking for an easy way out hah)! The thing about most modern Chinese readers that I don't like is they are extremely utilitarian, especially all textbooks published in mainland China itself. Not only they (China-published books) stick to the repertoire of "read text, memorise the list of words, fill in these blanks", but also the topics of their texts are largely grouped around "how wonderful it is for a foreigner to study and work in our great country". Blergh. Princeton readers and many recent Cheng-Tsui's books are more palatable to me. Plus I'll probably have to learn a bit of traditional charset for the old books above.

So those are on the to-(try to)read list. Plus tentatively Classical Chinese (in simplified, thank gods). Classical more because the further I get away from beginner-ish dialogues, the more I realise how much of modern written Chinese is rooted in 文言文. No matter the charset.

Oh and why I even started writing this post: I wanted to share my nerdy Anki model for Chinese cards. Here it goes - it started with stock Chinese Support model but grew a bit over last year. It's not beginner-friendly and has some minimal Russian language usage. I find some Chinese words and sentences make more sence when translated into Russian than into English. Probably a native language bias.

Thursday, 25 May 2017

I wouldn't call this "back" exactly though. Re-emerging with a completely non-infosex related topic after a year of learning Chinese (Mandarin, or 普通话, strictly speaking). Also, no links here - use Google if you are a curious type.

After a year of lazy (I was somewhat busy for half of 2015 first nearly dying and then recovering from an acident) learning and another year of studying reasonably seriously I'm at HSK4 level. Which isn't really as much as advertised, but that's a separate topic.

This motivates me to wax philosophical a little. Given all my applications of "trial and discover" method, I am certain that there are few to none language-specific tricks to learning Chinese. All of such advice can be applied to, or rather derived from, learning of other languages.

That is, all language learning is essentially the same - a lot of practice and repetition, preferably a long-term immersion. I've learned a few (human!) languages to various degrees of usefulness in the past, and it always worked that way.

Chinese is especially difficult because of

a) its peculiar writing system

b) complete lack of cognates with Indo-European (IE) languages and

c) a somewhat alien to a Western (IE, really) speaker grammar.

On the last point - it depends on your mother tongue, with English speakers being hit the heaviest among major IE language groups. At least, complaints in English are the most vocal! Thank gods I'm a native Russian speaker and, intriguingly, there are some concepts that are sort of common with Chinese - e.g. verbal aspect (~了) is not an entirely alien concept; "verbal complements" in Mandarin correspond to verbal prefixes in Russian, and so on.

Chinese-specific trick or two I learned are all related to writing: Learn writing of some characters, maybe a few hundred (but prioritise using pinyin IME), and learn the most common 150-ish radicals - this will make learning characters easier. Skritter is nice, partially because they record the state of all characters you ever learn, no matter whether you add or remove lists you're studying from.

Reading and listening comprehension - well, read and listen a lot; maybe prioritise learning words over single characters they consist of (not sure about the latter though). Zhongwen is a good Chrome plugin and ChineseGrammarWiki is a good online grammar.

Speaking - the same. Find a tutor, there are tons online, eg. on iTalki. A good piece of software to visualise tones is SpeakGoodChinese (or Praat, which it is based on).

That plus all of the generic language learning advice like "you learn what you practice", "find the group/book(Integrated Chinese for me)/method that works for you", "use SRS(Anki) if it's your thing". And motivation, which is the key with any learning.

Wednesday, 10 August 2016

A summary (copy-paste) of all "auditing tips" from the (still!) awesome TAOSSA book

Ch 6

Auditing Tip: Type Conversions
Even those who have studied conversions extensively might still be surprised at the way a compiler renders certain expressions into assembly. When you see code that strikes you as suspicious or potentially ambiguous, never hesitate to write a simple test program or study the generated assembly to verify your intuition.
If you do generate assembly to verify or explore the conversions discussed in this chapter, be aware that C compilers can optimize out certain conversions or use architectural tricks that might make the assembly appear incorrect or inconsistent. At a conceptual level, compilers are behaving as the C standard describes, and they ultimately generate code that follows the rules. However, the assembly might look inconsistent because of optimizations or even incorrect, as it might manipulate portions of registers that should be unused.Auditing Tip: Signed/Unsigned Conversions
You want to look for situations in which a function takes a size_t or unsigned int length parameter, and the programmer passes in a signed integer that can be influenced by users. Good functions to look for include read(), recvfrom(), memcpy(), memset(), bcopy(), snprintf(), strncat(), strncpy(), and malloc(). If users can coerce the program into passing in a negative value, the function interprets it as a large value, which could lead to an exploitable condition.
Also, look for places where length parameters are read from the network directly or are specified by users via some input mechanism. If the length is interpreted as a signed variable in parts of the code, you should evaluate the impact of a user supplying a negative value.
As you review functions in an application, it’s a good idea to note the data types of each function’s arguments in your function audit log. This way, every time you audit a subsequent call to that function, you can simply compare the types and examine the type conversion tables in this chapter’s “Type Conversions” section to predict exactly what’s going to happen and the implications of that conversion. You learn more about analyzing functions and keeping logs of function prototypes and behavior in Chapter 7, “Program Building Blocks.”Auditing Tip: Sign Extension
When looking for vulnerabilities related to sign extensions, you should focus on code that handles signed character values or pointers or signed short integer values or pointers. Typically, you can find them in string-handling code and network code that decodes packets with length elements. In general, you want to look for code that takes a character or short integer and uses it in a context that causes it to be converted to an integer. Remember that if you see a signed character or signed short converted to an unsigned integer, sign extension still occurs.
As mentioned previously, one effective way to find sign-extension vulnerabilities is to search the assembly code of the application binary for the movsx instruction. This technique can often help you cut through multiple layers of typedefs, macros, and type conversions when searching for potentially vulnerable locations in code.Auditing Tip: Truncation
Truncation-related vulnerabilities are typically found where integer values are assigned to smaller data types, such as short integers or characters. To find truncation issues, look for locations where these shorter data types are used to track length values or to hold the result of a calculation. A good place to look for potential variables is in structure definitions, especially in network-oriented code.
Programmers often use a short or character data type just because the expected range of values for a variable maps to that data type nicely. Using these data types can often lead to unanticipated truncations, however.Auditing Tip
Reviewing comparisons is essential to auditing C code. Pay particular attention to comparisons that protect allocation, array indexing, and copy operations. The best way to examine these comparisons is to go line by line and carefully study each relevant expression.
In general, you should keep track of each variable and its underlying data type. If you can trace the input to a function back to a source you’re familiar with, you should have a good idea of the possible values each input variable can have. Proceed through each potentially interesting calculation or comparison, and keep track of potential values of the variables at different points in the function evaluation. You can use a process similar to the one outlined in the previous section on locating integer boundary condition issues.
When you evaluate a comparison, be sure to watch for unsigned integer values that cause their peer operands to be promoted to unsigned integers. sizeof and strlen () are classic examples of operands that cause this promotion.
Remember to keep an eye out for unsigned variables used in comparisons, like the following:

if (uvar < 0) ...
if (uvar <= 0) ...

The first form typically causes the compiler to emit a warning, but the second form doesn’t. If you see this pattern, it’s a good indication something is probably wrong with that section of the code. You should do a careful line-by-line analysis of the surrounding functionality.Auditing Tip: sizeof
Be on the lookout for uses of sizeof in which developers take the size of a pointer to a buffer when they intend to take the size of the buffer. This often happens because of editing mistakes, when a buffer is moved from being within a function to being passed into a function.
Again, look for sizeof in expressions that cause operands to be converted to unsigned values.Auditing Tip: Unexpected Results
Whenever you encounter a right shift, be sure to check whether the left operand is signed. If so, there might be a slight potential for a vulnerability. Similarly, look for modulus and division operations that operate with signed operands. If users can specify negative values, they might be able to elicit unexpected results.Auditing Tip
Pointer arithmetic bugs can be hard to spot. Whenever an arithmetic operation is performed that involves pointers, look up the type of those pointers and then check whether the operation agrees with the implicit arithmetic taking place. In Listing 6-29, has sizeof() been used incorrectly with a pointer to a type that’s not a byte? Has a similar operation happened in which the developer assumed the pointer type won’t affect how the operation is performed?

Ch 7

Auditing Tip
When data copies in loops are performed with no size validation, check every code path leading to the dangerous loop and determine whether it can be reached in such a way that the source buffer can be larger than the destination buffer.Auditing Tip
Mark all the conditions for exiting a loop as well as all variables manipulated by the loop. Determine whether any conditions exist in which variables are left in an inconsistent state. Pay attention to places where the loop is terminated because of an unexpected error, as these situations are more likely to leave variables in an inconsistent state.Auditing Tip
Determine what each variable in the definition means and how each variable relates to the others. After you understand the relationships, check the member functions or interface functions to determine whether inconsistencies could occur in identified variable relationships. To do this, identify code paths in which one variable is updated and the other one isn’t.Auditing Tip
When variables are read, determine whether a code path exists in which the variable is not initialized with a value. Pay close attention to cleanup epilogues that are jumped to from multiple locations in a function, as they are the most likely places where vulnerabilities of this nature might occur. Also, watch out for functions that assume variables are initialized elsewhere in the program. When you find this type of code, attempt to determine whether there’s a way to call functions making these assumptions at points when those assumptions are incorrect.

Ch 8

Auditing Tip
When attempting to locate format string vulnerabilities, search for all instances of printf(), err(), or syslog() functions that accept a nonstatic format string argument, and then trace the format argument backward to see whether any part can be controlled by attackers.
If functions in the application take variable arguments and pass them unchecked to printf(), syslog(), or err() functions, search every instance of their use for nonstatic format string arguments in the same way you would search for printf() and so forth.Auditing Tip
You might find a vulnerability in which you can duplicate a file descriptor. If you have access to an environment similar to one in which the script is running, use lsof or a similar tool to determine what file descriptors are open when the process runs. This tool should help you see what you might have access to.Auditing Tip
Code that uses snprintf() and equivalents often does so because the developer wants to combine user-controlled data with static string elements. This use may indicate that delimiters can be embedded or some level of truncation can be performed. To spot the possibility of truncation, concentrate on static data following attacker-controllable elements that can be of excessive length.Auditing Tip
When auditing multicharacter filters, attempt to determine whether building illegal sequences by constructing embedded illegal patterns is possible, as in Listing 8-26.
Also, note that these attacks are possible when developers use a single substitution pattern with regular expressions, such as this example:

$path =~ s/\.\.\///g;

This approach is prevalent in several programming languages (notably Perl and PHP).

Ch 9

Auditing Tip
The access() function usually indicates a race condition because the file it checks can often be altered before it’s actually used. The stat() function has a similar problem.Auditing Tip
It’s a common misunderstanding to think that the less specific permission bits are consulted if the more specific permissions prevent an action.

Ch 10

Auditing Tip
When auditing code that’s running with special privileges or running remotely in a way that allows users to affect the environment, verify that any call to execvp() or execlp() is secure. Any situation in which full pathnames aren’t specified, or the path for the program being run is in any way controlled by users, is potentially dangerous.Auditing Tip
Carefully check for any privileged application that writes to a file without verifying whether writes are successful. Remember that checking for an error when calling write() might not be sufficient; they also need to check whether the amount of bytes they wrote were successfully stored in their entirety. Manipulating this application’s rlimits might trigger a security vulnerability by cutting the file short at a strategically advantageous offset.Auditing Tip
Never assume that a condition is unreachable because it seems unlikely to occur. Using rlimits is one way to trigger unlikely conditions by restricting the resources a privileged process is allowed to use and potentially forcing a process to die when a system resource is allocated where it usually wouldn’t be. Depending on the circumstances of the error condition you want to trigger, you might be able to use other methods by manipulating the program’s environment to force an error.

Ch 14

Auditing Tip
Examine the TCP sequence number algorithm to see how unpredictable it is. Make sure some sort of cryptographic random number generator is used. Try to determine whether any part of the key space can be guessed deductively, which limits the range of possible correct sequence numbers. Random numbers based on system state (such as system time) might not be secure, as this information could be procured from a remote source in a number of ways.

Ch 17

Auditing Tip
Examine all exposed static HTML and the contents of dynamically generated HTML to make sure nothing that could facilitate an attack is exposed unnecessarily. You should do your best to ensure that information isn’t exposed unnecessarily, but at the same time, look out for security mechanisms that rely on obscurity because they are prone to fail in the Web environment.Auditing Tip
Look at each page of a Web application as though it exists in a vacuum. Consider every possible combination of inputs, and look for ways to create a situation the developer didn’t intend. Determine if any of these unanticipated situations cause a page use the input without first validating it.Auditing Tip
Always consider what can happen if attackers visit the pages of a Web application in an order the developer didn’t intend. Can you bypass certain security checks by skipping past intermediate verification pages to the functionality that actually performs the processing? Can you take advantage of any race conditions or cause unanticipated results by visiting pages that use session data out of order? Does any page trust the validity of an information user’s control?Auditing Tip
First, focus on content that’s available without any kind of authentication because this code is most exposed to Internet-based attackers. Then study the authentication system in depth, looking for any kind of issue that lets you access content without valid credentials.Auditing Tip
When reviewing authorization, you need to ensure that it’s enforced consistently throughout the application. Do this by enumerating all privilege levels, user roles, and privileges in use.Auditing Tip
Although this sample application might seem very contrived, it is actually representative of flaws that are quite pervasive throughout modern Web applications. You want to look for two patterns when reviewing Web applications:

The Web application takes a piece of input from the user, validates it, and then writes it to an HTML page so that the input is sent to the next page. Web developers often forget to validate the piece of information in the next page, as they don’t expect users to change it between requests. For example, say a Web page takes an account number from the user and validates it as belonging to that user. It then writes this account number as a parameter to a balance inquiry link the user can click. If the balance inquiry page doesn’t do the same validation of the account number, the user can just change it and retrieve account information for other users.

The Web application puts a piece of information on an HTML page that isn’t visible to users. This information is provided to help the Web server perform the next stage of processing, but the developer doesn’t consider the consequences of users modifying the data. For example, say a Web page receives a user’s customer service complaint and creates a form that mails the information to the company’s help desk when the user clicks Submit. If the application places e-mail addresses in the form to tell the mailing script where to send the e-mail, users could change the e-mail addresses and appear to be sending e-mail from official company servers.

Auditing Tip
Weaknesses in the HTTP authentication protocol can prove useful for attackers. It’s a fairly light protocol, so it is possible to perform brute-force login attempts at a rapid pace. HTTP authentication mechanisms often don’t do account lockouts, especially when they are authenticating against flat files or local stores maintained by the Web server. In addition, certain accounts are exempt from lockout and can be brute-forced through exposed authentication interfaces. For example, NT’s administrator account is immune from lockout, so an exposed Integrated Windows Authentication service could be leveraged to launch a high-speed password guessing attack.
You can find several tools on the Internet to help you launch a brute-force attack against HTTP authentication. Check the tools sections at www.securityfocus.com andwww.packetstormsecurity.org.Auditing Tip
When you review a Web site, you should pay attention to how it uses cookies. They can be easy to ignore because they are in the HTTP request and response headers, not in the HTML (usually), but they should be reviewed with the same intensity you devote to GET and POST parameters.
You can get access to cookies with certain browser extensions or by using an intercepting Web proxy tool, such as Paros (www.parosproxy.org) or SPIKE Proxy (www.immunitysec.com). Make sure cookies are marked secure for sites that use SSL. This helps mitigate the risk of the cookie ever being transmitted in clear text because of deliberate attacks, such as cross-site scripting, or unintentional configuration and programming mistakes and browser bugs.Auditing Tip
Tracking state based on client IP addresses is inappropriate in most situations, as the Internet is filled to capacity with corporate clients going though NAT devices and sharing the same source IP. Also, you might face clients with changing source IPs if they come from a large ISP that uses an array of proxies, such as AOL. Finally, there is always the possibility of spoofing attacks that allow IP address impersonation.
There are better ways of tracking state, as you see in the following sections. As a reviewer, you should look out for any kind of state-tracking mechanism that relies solely on client IPs.Auditing Tip
If you see code performing actions or checks based on the request URI, make sure the developer is handling the path information correctly. Many servlet programmers use request.getRequestURI() when they intend to use request.getServletPath(), which can definitely have security consequences. Be sure to look for checks done on file extensions, as supplying unexpected path information can circumvent these checks as well.Auditing Tip
Generally, you should encourage developers to use POST-style requests for their applications because of the security concerns outlined previously. One issue to watch for is the transmission of a session token via a query string, as that creates a risk for the Web application’s clients. The risk isn’t necessarily a showstopper, but it’s unnecessary and quite easy for a developer or Web designer to avoid.

Data verification

Cases where information disclosure on the network is bad; or when forged or modified data can result in a security issue on the receiver.

Encryption may be necessary; data verificaiton may be required.

Access to system resources

Many protocols allow users request system resources implicitly or explicitly

Questions to consider:

Is credential verification for accessing the resource adequate?

Does the application give access to resources that it’s supposed to? (ie. implementation is flawed and discloses more than intended)

HTTP

See chapter 17 for details.

Header parsing. Vulnerabilities are more likely when parsing a “folded header”. Code sometimes assumes headers are limited in length, but an arbitrary long header can be supplied by using folded headers.

Accessing resources. HTTP is designed to serve content to clients.

Many examples of disclosing arbitrary files from the filesystem.

Encoding-related traversal bugs.

If the server implements additional features or keywords, check the corresponding code, more likely to have bugs.

CER Same as BER with restrictions: (used when large objects are transmitted; when all the object data is not available; when object sizes are not known at transmission time)

Constructed types must use an indefinite length encoding.

Primitive types must use the fewest encoding bytes necessary to express the object size.

DER smaller objects in which all bytes for objects are available and the lengths of objects are known at transmission time.

All objects must have a definite length encoding (no EOC)

The length encoding must use the fewest bytes necessary (same as CER)

Vulnerabilities in BER, CER, DER implementations

Tag encodings Some combinations of fields are illegal in certain variants of BER

e.g. in CER, an octet string of less than or equal to 1,000 bytes must be encoded using a primitive form rather than a constructed form. Is this really enforced? differences in IDS processing and end host processing.

Can trick the implementation into reading more bytes than are available in the data stream.

Length encodings A common problem.

Multibyte encodings - when the length field is made to be more bytes than are left in the data stream.

Packed encoding rules (PER)

More compact than BER. Can be used only to encode values of a single ASN.1 type. COnsists of 3 fields: Preamble, length and contents

Preamble - a bit map used when dealing with sequence, set, and set-of-data types.

Length - more complex than in BER. Aligned variants and unaligned variants. Constrained, semiconstrained and unconstrained.

The program decoding a PER but stream must already know the structure of an incoming ASN.1 stream so that it knows how to decode the length. Constrained vs unconstrainedand what boundaries are for constrained lengths.

Vulnerabilities in PER

A variety of integer related issues. Problems are more restricted because the values are more constrained.

In particular for unconstrained lengths bottom 6 bits can be only 1 to 4 but the implementation might not enforce this rule.

Checking return values incorrectly.

XML encoding rules

Very different problems because this is a text markup language. XER prologue and an XML document element that describes a single ASN.1 object. Prologue does not have to be used.

DNS

Fully functional resolver knows what to do when a non-recursive DNS server doesn’t have an answer

Stub resolver relies on a recursive name server to do all the work

Zones

Resource record types

DNS protocol structure

DNS name encoding and buggy parsers (3www6google3com)

Sample problems:

Failure to deal with invalid label lenghts. The maximum size for a label is 63 bytes because setting the top 2 bits indicates that the byte is the first in a two-byte pointer, leaving 6 bits to represent a label length. That means any label in which one of the top bits is set but the other one isn’t is an invalid length value.

Insufficient destination length checks

Insufficient source length checks

Pointer values not verified in the packet

Special pointer values (when pointer compression is used)

Length variables. (There are no 32-bit integers to specify data lengths in the DNS protocol; everything is 8 or 16 bits)

TCP streams

TCP state processing

Urgent pointer processing

Neglecting to check that the pointer is within the bounds of the current packet

Recognising that the pointer is pointing beyond the end of the packet and trying to handle it (often incorrectly)

Handling 0-offset urgent pointers

0 offset URG pointer is invalid

Simultaneous open

Both peers send a SYN packet at the same time with mirrors source and destination ports. Then they both send a SYN-ACK packet, and the connection is established.

Ch. 15 Firewalls

Intro

Attack surface - Proxy firewalls

Same issues as with network servers. Also make sure the firewall makes a clear distinction between internal and external users or tracks authorised users.

Packet-filtering firewalls

Stateless vs stateful filters

Stateless Firewalls

TCP
Stateless firewalls look for connection initiation packets - SYNs, and more or less let other packets go through.Can be abused for FIN scanning (not sure this works anymore). Stateless FW has to let FIN and RST packets through.Different stacks behave differently for weird combinations of flags. Eg. SYN-FIN may initiate a connection.UDP
Only port-based rules. Return packets a big problem - e.g. DNS replies from servers. Effectively creates a hole for UDP scanning with a source port 53.FTP
Active / passive FTP; active is a problem for stateless firewalls, similar to UDP above but with TCP.Fragmentation
Either deny completely or apply very simple set of rules to process. No tracking because stateless. Some rules:

Fragments with low IP offset (1,2 etc) - drop as they will mess with TCP flags

Fragments with 0 offset should contain the full header, otherwise drop

Multiple offset 0 fragments - drop all after the full header

Fragments with high offset can pass

Simple stateful firewalls

TCP

These days any issues are rare

UDP

A common mistake is to allow responses from any UDP port

Directionality

Fragmentation handling

Can be done better than with stateless FWs. Bugs existed

Fooling virtual reassembly

IP TTL field

IP options

Stateful inspection firewalls

Checkpoint’s original term - looking inside the packet

Layering issues

Firewalls are not doing full TCP/IP processing and so make mistakes because they peek at layer they do not understand

For FTP, simplistic port lookup in the packet can be fooled into creating connections in the state table by faking 227 responses in the packet

Spoofing attacks

Obviously cannot muck with the destination IP.

Spoofing from an internal trusted source

Spoofing for a response

Try to get hosts to respond to addresses you cannot reach otherwise

Especially with source address 224.0.0.1 or 127.0.0.1

Spoofing for a state entry - to get special entries added to the firewall state table for later use

Sunday, 7 August 2016

Ch. 13 Synchronisation and State

Synchronisation problems

MutexReentrancy - function’s capability to work correctly, even when it’s interrupted by another running thread that calls the same function. It must not modify any global vars or shared resources w/o adequate locking.Race conditionsIn race conditions outcome of an operation is successful only if certain resources are acted on in an expected order.Starvation and deadlocksStarvation - a thread never receives ownership of a synchronisation object.Deadlocks can occur when several thread are using multiple sync objects at once but in a different order. For a deadlock to be possible, 4 conditions are required: mutual exclusion, hold and wait, no preemption, circular wait

Process synchronisation

System V process synchronisation

Semaphore - a locking device that uses a counter to limit the number of instances that can be acquired. Decremented when acquired, incremented when released.semget() - create a new semaphore set or obtain an existing setsemop() - performs operations on selected semaphores in a setsemctl() - perform a control operation on a selected semaphore

Windows process synchronisation

<skipped>

Vulnerabilities with interprocess synchronisation

Synch objects required but not used, e.g. when 2 processes are attempting to access a shared resource

Incorrect use (Windows)

Squatting with named synchronisation objects (Windows)

Helpful tools/notes:

Synchronisation objects scoreboard

Lock matching

Signals

Signals are software interrupts that the kernel raises in a process at the request of other processes, or as a reaction to events that occur in the kernel.
Possible actions:

Ignore the signal (apart from SIGKILL and SIGSTOP)

Block the signal (same exception)

Install a signal handler

kill() system call is used to send a signal to a processsignal() for installing a handlersigaction() interface - more detailed attributes for handled signalssetjmp(), longjmp(), sigsetjmp(), siglongjmp() often used in signal-handling routines to return to a certain location in the program in order to continue processing after a signal has been caught. Program context of setjmp() is restored when returned from longjmp(). Zero return value means a call to setjmp, a non-zero value indicates a return from a longjmp

Signal vulnerabilities

Signal handlers need to be asynchronous-safe - can safely and correctly run even if it is interrupted by an asynchronous even. It is reentrant by definition by also correctly deal with signal interruptions.
Problem when the handler relies on some sort of global program state, such as assumption that global variables are initialised when in fact they aren’t.
Various problems (non-asynchronous-safe state) may arise from attempting to restart execution using longjmp() function in non-returning signal handlers.
Other problems can be caused by invalid longjmp targets. The function that call setjmp or sigsetjmp must be still on the runtime exec stack whenever longjmp or siglongjmp are called. If the original function has terminated, the pointer will be invalid.
Pay special attention for the following reasons:

The signal handler doesn’t return, so it’s highly unlikely that it will be asynchronous safe unless it exits immediately.

It might be possible to find a code path where the function that did the setjmp returns, but the signal handler with the longjmp is not removed.

The signal mask might have changed, which could be an issue if sigsetjmp and siglongjmp aren’t used. If they are, does restoring the old signal mask cause problems as well?

Permissions might have changed.

Program state might have changed such that the state of variable that are valid when 8setjmp* is originally called but not necessarily when longjmp is called.

The signal handler itself can be interrupted or called more than once. A signal handler can be interrupted only if a signal is delivered to the process that isn’t blocked. Signals are blocked by usingsigprocmask() function, or implicitly - signals of the type the handler catches is blocked vof the period of time the signal handler is running. Also sigaction() function.
Sometimes non-async safe functions are used in signal handlers (see signal(3) or sigaction(2))
Signal handlers using longjmp and siglongjmp are practically guaranteed to be non-async safe unless they jump to a location that immediately exits.

Threads

PThreads API is the primary API on UNIX. Uses mutexes and condition variables. Linux has a modified version - LinuxThreads. On Windows the API is more complicated.
<skipped> - Critical sections

Threading Vulnerabilities

Race conditions occurs when the successful outcome of an operation depends on whether the threads are scheduled for running in a certain order.

Auditing:

Identify shared resources that are acted on by multiple threads.

Determine whether the appropriate locking mechanism has been selected. There are specific rules in the book for different types of resources.

Examine the code that modifies this resource to see whether appropriate locking mechanisms have been neglected or misused.

Deadlocks and starvation

In PThreads deadlocks are more likely to occur rom the use of multiple mutexes. A classic situation: two or more locks can be held by a single thread, and another thread can acquire the same locks in a different order.

Saturday, 6 August 2016

Ch 10. UNIX II: Processes

Processes

fork() creates new processes. Returns in parent the PID of the new child process; in the child process - 0. Return value -1 means call failed, no child spawnedgetppid() - get parent PIDIf a process terminates while its children are still running, these children are assigned to init (PID 1)In Linux clone() is a fork() variant that allows callers to specify several parameters of the forking operation
Child inherits a copy of most resources from the parent. For files - different. Child gets a copy of the parent’s file descriptors, and both processes share the same open file structure in the kernel (which points to an inode). As a result parent and child may be fighting for access to the file.

Program invocation

execve() is the standard way of invoking processes execvp() and execlp() if filename is missing slashes, they use PATH env variable to resolve the location of the executable. They also open a shell to run the file if execve fails with ENOEXEC.
It may be possible to supply program switches in the argument array if it is not sanitised properly. Keep in mind that getopt() interprets only the arguments preceding – (two dashes)

Metacharacters - see [[TAOSSA notes ch 8]]

Globbing

Environment issues

Setuid shell scripts

Process Attributes

Process attribute retention:

File descriptors usually get passed on from the old process to the new one

Signal masks - the new process loses all signal handlers installed by the previous process but retains the same signal masks

Effective UID - if the program is setuid, the EUID becomes the user ID of the program file owner. Otherwise it stays the same across the execution.

Effective GID - if setgid, the egad becomes the group ID of the program file group

Saved set-UID - set to the value of the EUID after any setuid processing has been completed

Saved set-GID - similar

Real UID, GID - preserved across execution

PID, PPID, PGID - don’t change across an execve() call

Supplemental group privileges are retained

Working dir, root dir - same

Controlling terminal - inherits from the old process.

Resource limits - a lot of details

Umask -

Users can set tight limits on a process and then run a setuid or setgid program. Rlimits are cleared out when a process does a fork(), but they survive the exec() family of calls, which can be used to force a failure in a predetermined location in the code. The error-handling code is usually less guarded than more well-traveled code paths.
UNIX does allow developers to mark certain file descriptors as close-on-exec, which means they are closed automatically if the process runs a new program. For applications that spawn new processes at any stage, always check to see whether this step is taken when it opens files. It is also useful to make a note of those persistent files that aren’t marked to close when a new program starts.
Security checks on a file descriptor are performed only once, when the process initially creates a file descriptor by opening or creating a resource. If you can get access to a file descriptor that was opened with write access to a critical system file, you can write to that file regardless of your effective user ID or other system privileges. Therefore, programs that work with file descriptors to security-sensitive resources should close their descriptors before running any user-malleable code.setenv() and unsetenv() may be dodgy in how they behave with funny variable names.

Interprocess communication

Named pipes created with insufficient privileges might result in unauthorized clients performing some sort of data exchange, potentially leading to compromise via unauthorized (or forged) data messages.
Applications that are intended to deal with regular files might unwittingly find themselves interacting with named pipes. This allows attackers to cause applications to stall in unlikely situations or cause error conditions in unexpected places. When auditing an application that deals with files, if it fails to determine the file type, consider the implications of triggering errors during file accesses and blocking the application at those junctures.
The use of mknod() and mkfifo() might introduce a race condition between the time the pipe is created and the time it’s opened.
Three IPC mechanisms in System V IPC are message queues, semaphores, and shared memory.Named UNIX domain sockets provide a general-purpose mechanism for exchanging data in a stream-based or record-based fashion.

Ch 9. Unix I: Privileges and objects

UID functions

seteuid(). Change the effective user ID associated with the process If a process is running with superuser privileges (effective user ID of 0), it can set the effective user ID to any arbitrary ID. Otherwise, for non-root processes, it can toggle the effective user ID between the saved set-user-ID and the real user ID

setuid(). Changes all 3 UIDs, is used for permanently assuming the role of a user, usually for the purposes of dropping privileges

setresuid(). Explicitly set all 3 UIDs. “-1” is used for “keep the same”. Non super-user can set any of the 3 to a value of any currently assigned 3 UIDs. Super-user - to any value.

setreuid(). Set real UID and effective UID, similar to setresuid. More important on Solaris and older BSDs who don’t have setresuid

Before OpenBSD imported the setresuid() function and rewrote the setreuid() function, the only straightforward way for a nonprivileged program to clear the saved set-user-ID was to call thesetuid() function when the effective user ID is set to the real user ID. This can be accomplished by calling setuid(getuid()) twice in a row.

Group ID functions

setegid()

setgid()

setresgid()

setregid()

setgroups(). Set supplementary groups by the process. Can only be performed by a process with an effective UID 0

Privilege vulnerabilities

A program can drop its root privileges by performing a setuid(getuid()), which sets the saved set-user-ID, the real user ID, and the effective user ID to the value of the real user ID.
A setgid+setuid program can drop root privileges: (order important)

If the order is reversed, in Linux, Solaris, and OpenBSD, only the effective group ID is modified, and the saved set-group-ID still contains the group ID of the privileged group.
Same pair of calls for non-root prigs only changes effective IDs, not the saved IDs (in FreeBSD and NetBSD all three IDs are changed)
A similar case is when privileges are temporarily dropped and then setuid is called from while under non-0 (root) user. In most implementations this does not affect saved user ID and root privileges can be recovered by using seteuid(0)
Another situation - incorrect attempts to drop privileges temporarily
The book has a couple of pages of checklists for auditing privilege-management code

setgroups() works only when running with euid 0

Attempting to drop privileges while not running with euid 0 will not work

Using setegid() or seteuid() to drop root privileges is a mistake

Privileged groups and supplemental groups must be dropped before the process gives up its effective user ID of root

*setgid(getgid()) for non-root leaves saved UID set to a privileged user

For temporary dropping of privileges:

Make sure the code drops any relevant group permissions as well as supplemental group permissions.

Make sure the code drops group permissions before user permissions.

Make sure the code restores privileges before attempting to drop privileges again, either temporarily or permanently.

File security

File IDs

The kernel sets the file’s owner and group when the file is first created. The owner is always set to the effective user ID of the process that created the file.There are two common schemes by which the group ID can be initialised.

BSD-based systems tend to set the initial group ID to the group ID of the file’s parent directory.

The System V and Linux approach is to set the group ID to the effective group ID of the creating process.

File permissions

The four components of the permission bitmask are owner permissions, group permissions, other permissions, and a set of special flags.The kernel looks only at the most specific set of permissions relevant to a given user. It’s a common misunderstanding to think that the less specific permission bits are consulted if the more specific permissions prevent an action.The three special permission bits are the setuid bit, the setgid bit, and the sticky (or tacky) bit.

Umask

To calculate the initial permission bits for a new file, the permission argument (mode) of the file creation system call is calculated with a bitwise AND operation with the complement of the umask value.Umask is inherited by the new program; default umask is usually 022

Privilege management with file operations

File opening is typically done with the open(), creat(), mknod(), mkdir(), or socket() system calls; a file’s directory is altered with calls such as unlink() and rename(); and file attributes are changed with calls such as chmod(), chown(), or utimes(). All these privilege checks consider a file’s permission bitmask, ownership, and group membership along with the effective user ID, effective group ID, and supplemental groups of the process attempting the action. Effective permissions of the process are critical.
Sources of issues with file permissions:

Recklessness with permissions

Libraries doing stuff in the background

Permissions when creating files

Unix open() interface, specific mode and its umask interaction

Forgetting O_EXCL (if open() is called with O_CREAT but not O_EXCL, the system might open an existing file instead of creating a new one)

setuid root files created but the less privileged users

Directory safety (who owns the directory the file is in, who can write to it). All parents in the path must be safe.

Filenames and paths: absolute and relative; special entries. Every time you use a system call that takes a pathname, the kernel goes through the process of stepping through each directory to locate the file. For the kernel to follow a path, you must have search permissions on every directory in that path.

Pathname tricks, dir traversal

Embedded NUL

Dangerous dirs

File internals

File descriptor

File descriptor table

Inodes

Directories

Links - symlinks, hard links

Race conditions

Race conditions are situations in which two different parties simultaneously try to operate on the same resource with deleterious consequences.For UNIX file system code, these issues usually occur when you have a process that gets preempted or enters a blocking system call at an inopportune moment. This moment is typically somewhere in the middle of a sensitive multiple-step operation involving file and directory manipulation. If another process wins the race and gets scheduled at the right time in the middle of this “window of inopportunity,” it can often subvert a vulnerable nonatomic sequence of file operations and wrest privileges from the application.

Stat() family of functions

stat(), fstat(), lstat()fstat() is the most resilient in terms of race conditions, as it’s operating on a previously opened filelstat() does not follow links, stat() doesStandard protection against link-based attacks is to use lstat() on a requested filename and either explicitly check it’s a link or check it’s a file and fail when it’s notBeware of TOCTOU issues in the above scenarioPossible to delete or rename links when files are open. The kernel does not care if the file that fd indexes has been deleted or renamed. As long as the file descriptor is kept open, the file and corresponding in ode in the file system stay available. This can be used when the program checks after opening the file

Recap and other races

Most file system race conditions can be traced back to using system calls that work with pathnamesPermissions are established by how the file is opened and security checks at that time, so changing permissions will not affect access
Anything besides a single file-based sys call to open a resource followed by multiple file-descriptor based calls has a chance of a race conditionEvading file access checks: Another vulnerability pattern - security check function uses a filename followed by a usage function that uses a filenamePermission races: an app temporarily exposes a file to modification for a short window of time by creating it with insufficient permissionsIf attackers can open the file during that window, they retain access even after permissions have been correctedOwnership races: File is created with the effective privileges of a non privileged user, then later file owner is changed to that of a privileged user. If attackers open the file between open() andfchown(), they get a fd with access mask permitting read and write to the fileDirectory races: If a program descends into user-controllable directors, user can move directories around and cause the program to operate on sensitive files

Temporary files

Temp directors are marked as sticky directories with mode octal 1777

Unique file creation

mktemp() generates very easily predictable unique name, based on process ID of the caller plus a static patternThis can be used in race condition scenariostmpnam() and tempnam() have same race condition issues as mktemp()mkstemp() much safer if used correctlytmpfile() and mkdtemp() - safe functions

File reuse

Applications also might have a requirement to open temporary files that already exist in a temporary directory. Opening these files is difficult.
Preventing opening soft or hard links is difficult.
Cryogenic sleep attack - sending a job control signal such as SIGSTOP to the application at the right moment then manipulating files. Possible if the program is a setuid root program users had started in their terminal session

STDIO file interface

UNIX application code commonly uses stdio in lieu of the lower-level system call API because it automatically implements buffering and a few convenience functions for data formatting.A typical FILE structure contains a pointer to buffered file data (if it’s a buffered stream), the file descriptor, and flags related to how the stream is opened.fopen() is used for opening files. Same potential problems as open(). If the implementation does not take a mask as a parameter, it applies default mask of 0666 then fighter restricts permissions based on umask of the current process. In a privileged context, it should be used very carefully.freopen() has the same problems,fdopen() does not.fread() similar to read() but reads a specified number of params of specified size. Multiplication is involved, potential for integer overflow. fgets() reads a single line from the file. Potential problems: ignoring the return value - if it returns NULL, contents of the destination buffer are unspecified. Another one: when the file containing user-controlled data is incorrectly parsed (because fgets reads up to x chars but not the whole line).fscanf() reads data of a specified format directly into vars. Potential for buffer overflows when using this function to read in string values. Also need to check the return value.
With writing to a file there are more limitations for users to affect the application, because the data being manipulated is already in memory. Much fewer security implications of writing it into a file.Potential format string vulnerabilities for printf family; users messing with file formatfclose() - if called twice on a FILE structure a double free() would occur, with a possibility of corrupting the heap.