Research on Distributed Service Discovery for Semantic Web Services

–An Ontology Partition and Prefix-routing based Semantic Web Service Discovery

ABSTRACT

Using ontology based Semantic Web markup languages which create a computer-interpretable description of service, Semantic Web Services make it possible to discovery Web Services automatically. In real world, Semantic Web Services are usually described by a set of concepts belonging to large-scale domain ontologies. Along with the large quantities of the concept combinations, the number of Web Services also grows dramatically. How to efficiently manage the huge number of Web Services described by the large-scale ontologies becomes a critical issue in the large-scale service discovery area.

As ontology graph has an intrinsic fine hierarchy and modularity, this paper partitions the ontology graph into several concept groups with semantic similiarity. Then cluster the Semantic Web Services by mapping their description concept sets to the concept group sets. Then a structured P2P network is proposed to manage the clusters.

Based on the ROCK (A Robust Clustering Algorithm for Categorical Attributes) cluster algorithm, this paper introduces a large-scale ontology graph oriented partitioning algorithm, ROCKOn2. ROCKOn2 partitions the large-scale ontology graph into several concept groups. Then using the proposed ROCKOn2Cluster algorithm, the distributed Web Services belonging to the same concept groups are congregated together. In order to manage the clustered services, an ontology partition and prefix-routing based semantic web services discovery system, Spring, is proposed. Through ROCKOn2 algorithm, the contents and nodeId of peers are represented by varying length of concept groups; meanwhile, Spring is an structured integrated semantic P2P system with prefix routing mechanism supporting varying encoding length. The experiment and the practical usage in PISOMWare application shows that Spring system has stable routing hops and efficient discovery ability which is suitable for the large-scale ontology based distributed Web applications.

Recently I need to calculate the time cost for some test cases running on Linux. After searching online, I got an answer: using time.clock or time.time. And I did a sample test on Windows with time.clock which works well. But when running it under Linux. The result is totally different.

My code as below which is quite simple by calling time.clock or time.time:

import time

print ‘——-’

start = time.clock( )

print ‘start = ‘,start

time.sleep( 2 )

end = time.clock( )

print ‘end = ‘,end

print ‘elapse = ‘,end – start

print ‘——-’start = time.time( )

print ‘start = ‘,start

time.sleep( 2 )

end = time.time( )

print ‘end = ‘,end

print ‘elapse = ‘,end – start

The output are quite different on Linux and windows for time.clock function.

Linux

Windows

—-time.clock—

start = 0.01

end = 0.01

elapse = 0.0

—-time.time—

start = 1316423764.45

end = 1316423766.45

elapse = 2.00015687943

—-time.clock—

start = 1.5398325817e-06 #this is really small

end = 2.00183702027

elapse = 2.00183548044

—-time.time—

start = 1316423430.99

end = 1316423432.99

elapse = 2.00199985504

From the above table, we can see that for time.clock, the elapsed time is ZERO which is not what we expected. So what does time.clock really do on different platforms?

From below descriptions copied from Python’s manual, we can see that for time.clock, it has totally different behavior.

time.clock()

On Unix, return the current processor time as a floating point number expressed in seconds. The precision, and in fact the very definition of the meaning of “processor time”, depends on that of the C function of the same name, but in any case, this is the function to use for benchmarking Python or timing algorithms.

On Windows, this function returns wall-clock(ie, the time on the clock on your wall) seconds elapsed since the first call to this function, as a floating point number, based on the Win32 function QueryPerformanceCounter. The resolution is typically better than one microsecond.

time.time()

Return the time as a floating point number expressed in seconds since the epoch, in UTC. Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls.

Conclusion

If you’re writing code that’s meant only for Windows, either will work (though you’ll use the two differently – no subtraction is necessary for time.clock()). If this is going to run on a Unix system or you want code that is guaranteed to be portable, you will want to use time.time().

Unlike other OOP language, such as C++/Java, which provides private access mechanism to prevent its constructor be accessed and then be instanced, Python doesn’t have such mechanism, i.e, “Private” instance variables that cannot be accessed except from inside an object don’t exist in Python.

So how to implement Singleton in Python without “Private” support? Overriding __new__ function is the key. __new__ is a classmethod, and you can override it when you need to control the creation of a new instance. __new__ is the first step of instance creation. It’s called first, and is responsible for returning a new instance of your class. In contrast, __init__ doesn’t return anything; it’s only responsible for initializing the instance after it’s been created. so use __init__ when you need to control initialization of a new instance. Below is the sample code of Singleton in Python.

There are several modules provided by python you can use to send HTTP GET/POST requests: httplib/urllib/urllib2.

In general, httplib module defines classes which implement the client side of the HTTP and HTTPS protocols. It is normally not used directly — the module urllib uses it to handle URLs that use HTTP and HTTPS.

urllib module provides a high-level interface for fetching data across the WWW. urllib2 module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.

1. Please use urllib2 instead of httplib if impossible. urlib2 provides a high-level interface which is more convenient. From below two functions, we can see that when using httplib, we need to specify more parameters.

The close method must be called on the result of urllib.urlopen, not on the urllib module itself.

The best approach: instead of x = urllib.urlopen(u) etc, use:

import contextlib
with contextlib.closing(urllib.urlopen(u)) as x:
...use x at will here...

The with statement, and the closing context manager, will ensure proper closure even in presence of exceptions.

3. If you don’t know how to determine the POST parameters by reading the submit page’s html source code, try to use Fiddler or Wireshark to capture the http traffic and find out the POST data. For example, below screenshot is the POST data for Trend Micro’s site safety checking page when submitting www.sina.com.cn as query parameter.

Free download manager is a popular tool that supports IE/FF to download files via HTTP, HTTPS and FTP. One highlight of its features is its “download acceleration”. FDM splits files into several sections and downloads them simultaneously. Have you ever been curious about its implementations? Don’t worry, this article will shed some light on the basic theories behind it.

1. HTTP HEAD Request and HTTP Reponse Content-Length & Accept-Ranges Headers
The HEAD method is a standard HTTP method that acts as if I’ve made a GET request, but it returns only the headers and not the body. This allows me to find out some information about the resource without actually taking the time or using the bandwidth to download it.

For example, I can read the corresponding HTTP ResponseContent-Length header and determine the size of the resource.

Another important reponse’s header is

Accept-Ranges Header

This header indicates to the Web client that the server has the capability to handle range requests. There are only two valid formats for the Accept-Ranges header that are allowed according to the definition:

Accept-Ranges: bytes
Accept-Ranges: none

These basically indicate that the Web server does and does not accept range requests, respectively.

If Web server supports range requests, the client can then use below range header to download partial contents.

2. HTTP GET Request – Range Header
The Range header allows the HTTP client to request partial content, rather than the usual full content, by specifying a range of bytes it seeks to receive.

For example, to request the first 500 bytes of content, the following Range header should be included in the request:

Recently I was stucked by a question as what a “.LIB” file is for. Two concepts came into my mind, “static linking” and “dynamic linking”. Is “.LIB” file for static linking? But I remembered that one type of dynamic linking also requires a “.LIB” file. Are the two “.LIB” files the same? Hope today’s session can give you an answer.

Static Linking

Static libraries are used when you don’t want your final compiled application to have any dependencies. So to make it standalone, the compiler embeds all the code from the static library in the final executable and removes any dependencies it had. A static library is just a collection of object files.The overhead is huge. Each executable will have its own copy of all the functions. With small libraries with a few Kilobyte of overhead, it would be fine, but what about big libraries like MFC.

Dynamic Linking

Dynamic linking refers to linking at runtime rather than at compile time. Information is still embedded in the final executable but it’s the bare minimum for the loader (which loads the executable at runtime in the memory) to identify the DLL the program uses and load them with the application by mapping all the DLLs to the process’ address space. Dynamic linking has two forms depending on how the information is embedded in the final executable. They are Implicit Linking and Explicit Linking.

(1) Implicit Linking(aka. static load or load-time dynamic linking)

Implicit linking occurs at compile time when an application’s code makes a reference to an exported DLL function. When the source code for the calling executable is compiled, the DLL function call translates to an external function reference in the object code. To resolve this external reference, the application must link with the import library (.LIB file) that is produced when the DLL is built.

The import library only contains code to load the DLL and to implement calls to functions in the DLL. Finding an external function in an import library informs the linker that the code for that function is in a DLL. To resolve external references to DLLs, the linker simply adds information to the executable file that tells the system where to find the DLL code when the process starts up.

To implicitly link to a DLL, executables must obtain the following from the provider of the DLL:

An import library (.LIB files) to link with. The linker creates the import library when the DLL is built.

The actual DLL (.DLL file).

(2) Explicit Linking(aka. dynamic load or run-time dynamic linking)

With explicit linking, applications must make a function call to explicitly load the DLL at run time. To explicitly link to a DLL, an application must:

Call LoadLibrary() (or a similar function) to load the DLL and obtain a module handle.

Call GetProcAddress() to obtain a function pointer to each exported function that the application wants to call. Because applications are calling the DLL’s functions through a pointer, the compiler does not generate external references, so there is no need to link with an import library. Also no need the header file, only DLL is requried.

Call FreeLibrary() when done with the DLL.

Please refer to http://msdn.microsoft.com/en-us/library/784bt7z7%28v=VS.80%29.aspx for an explicit linking example.

Most applications use implicit linking because it is the easiest linking method to use. However, there are times when explicit linking is necessary. Here are some common reasons to use explicit linking:

The application does not know the name of a DLL that it will have to load until run time. For example, the application might need to obtain the name of the DLL and the exported functions from a configuration file.

A process using implicit linking is terminated by the operating system if the DLL is not found at process startup. A process using explicit linking is not terminated in this situation and can attempt to recover from the error. For example, the process could notify the user of the error and have the user specify another path to the DLL.

A process using implicit linking is also terminated if any of the DLLs it is linked to have a DllMain() function that fails. A process using explicit linking is not terminated in this situation.

An application that implicitly links to many DLLs can be slow to start because Windows loads all of the DLLs when the application loads. To improve startup performance, an application can implicitly link to those DLLs needed immediately after loading and wait to explicitly link to the other DLLs when they are needed.

Explicit linking eliminates the need to link the application with an import library. If changes in the DLL cause the export ordinals to change, applications using explicit linking do not have to re-link (assuming they are calling GetProcAddress() with a name of a function and not with an ordinal value), whereas applications using implicit linking must re-link to the new import library.

Virus Bulletin is a famous comparative test company in security area. For many years, Virus Bulletin has carried out independent comparative testing of anti-virus products. Its VB100 certification scheme is widely recognized within the industry. Also each year VB also holds a VB conference to share the latest malware and anti-malware technologies.

VB100 tests the production with its RAP(Reactive And Proactive) methodology. It uses reactive test to gauge the products’ abilities against most recent known malwares, while using proactive test to gauge products’ ability to detect new and unknown samples proactively, using heuristic and generic techniques.

But during VB100 test, it only use on-demand scan and on-access scan. The samples are not executed. So VB100′s test belongs to static testing.

AV-Comparatives is an Austrian Non-Profit-Organization, which is providing independent Anti-Virus software tests free to the public.

AV-Comparatives provides wider test scopes. Besides on-demand/on-access scanning, it also provides whole-product-dynamic test and removal/cleaning test. Its whole-product-dynamic-test mimics malware reaching and executing on a user’s machine, as it happens in the real world. This means that not only the signatures, heuristics and in-the-cloud detections are evaluated, but URL-blockers, Web reputation services, exploit-shields, in-the-cloud heuristics, HIPS and behavioral detection are also considered.

It also published the details of the test methodology which is located at here.

Dennis Technology Labs only provides whole-product-dynamic test, and its whole product test are run with real URL-based attacks. Please refer to its latest report for the its detailed test methodology.

I have to mention this AMTSO organization. The Anti-Malware Testing Standards Organization (AMTSO) was founded in May 2008 as an international non-profit association that focuses on the addressing the global need for improvement in the objectivity, quality and relevance of anti-malware testing methodologies.

From its home page, you can download the Principles and Guidelines related to security testing. Though these documents won’t give you details on how to setup a test system, its principles and guidelines are still quite useful.

I put this lab here is due to it is from my own country, China. Compared with other famous labs, I have admit that PCSL lab is quite young(which has just been established for about two years), it still has to strive for a while before being recognized widely, such as expanding its test areas, making its report more professional, and maybe present different but convincing reports on the malware specific to China area, etc. Anyway, this is a good start for China. At least, someone is trying to set up a professional institute in China. Come on, PCSL!

Quote of The Day

Christianity will go. It will vanish and shrink. I needn't argue with that; I'm right and I will be proved right. We're more popular than Jesus now; I don't know which will go first - rock and roll or Christianity. - John Lennon -