Share this:

Like this:

Hurricane Harvey was the first major hurricane to make landfall in the United States since Wilma in 2005, ending a record 12-year period with no major hurricanes making landfall in the country. In a four-day period, many areas received more than 40 inches (1,000 mm) of rain as the system meandered over eastern Texas and adjacent waters, causing catastrophic flooding. The resulting floods inundated hundreds of thousands of homes, displaced more than 30,000 people, and prompted more than 17,000 rescues.

Qatar Computing Research Institute(QCRI) has been working on machine learning for UAV(unmanned aerial vehicle – drone) imagery using deep learning algorithms. For Hurricane Harvey, we are able to run our algorithm. Please see the below result. And, here is our reportthat explains how this works.

We are still looking for more data for improving our model so that we can share with the community.

We really appreciate your help, especially, Standby Task Force volunteers & Digital Jedis for this research project. This paper is a very special milestone in the UAV research community because it is the first UAV Imagery paper using deep learning algorithms. We are working hard for next research paper based on Philippines expedition dataset.

Like this:

Thank you for your help! We have completed the activation. These dataset will be used to enhance the computer vision model. Please see the below for the Digital Jedis final activity map.

Update: Our volunteers’s activity Map

Dear MicroMappers,

We need your help!

We are launching now the MicroMappers damage assessment expedition to the Philippines! The purpose of this deployment is to develop & delivery the cutting-edge machine learning algorithm. It will be used to automatically detect and categorize damaged infrastructure in UAV images taken in the aftermath of natural disaster. Your help is essential of this new deep learning machine learning algorithm. The algorithm will be released to public as a final production for humanitarian purpose.

Share this:

Like this:

Early of this year, we have introduced MicroMappers new face, MicroMappers Hub. Since the MicroMappers Hub release, we have been releasing new features slowly. The current MicroMappers key features are;

Twitter streaming and historical search data crawling. Unlike other platform, you can collect historical data + now & coming data based on the keywords.

Facebook search on public groups & public pages

Gdelt world news download that is associated to any crises. You can download 3w & crises image related data. It is refreshed every 15min. Almost nearby real-time dataset.

Now, you can see the current image classifier list. If you want to define your own, please click “Request New Image Classifier”. It will redirect to the configuration page. If you click on “View Map”, you can see the map with images. see the below.

This is Image Classifier configuration page. Basically, you need to fill out the form.

As you can see, amazing features are here. If you are not sure how to start, please check “Tutorial” first.

We want to hear your experience and needs. Please visit the MicroMappers Hub, give us feedback.

Thank you,

MicroMappers Team,

Share this:

Like this:

Named Entity Recognition (NER) involves identifying named entities such as persons, locations, and organizations in text. NER is essential for a variety of Natural Language Processing (NLP), Information Retrieval (IR), and Social Computing (SC) applications. In this blog, I present QCRI’s state-of-the-art Arabic microblogs NER system.

Microblog NER Challenges

NER on microblogs faces many challenges such as:

(1) Microblogs are often characterized by informality of language, ubiquity of spelling mistakes, and the presence of Twitter name mentions (ex. @someone), hashtags, and URL’s;

(2) NE’s are often abbreviated. For example, tweeps (tweet authors) may write “Real Madrid” as just “the Real”;

(3) Tweeps often use brief and choppy expressions and incomplete sentences;

(4) Word senses in tweets may differ than word senses in news. For example, “mary jane” in tweets likely refers to Marijuana as opposed to a person’s name;

(5) Tweeps may inconsistently use capitalization (for English), where capitalized words may not capitalized and ALL CAP words are used for emphasis; and

(6) We observed that NE’s often appear in the beginning or the end of tweets and they are often abbreviated.

Most named entities that are observed in tweets are unlikely to have been seen during the training of a NER.

Tweets frequently use dialects, which may lack spelling standards(ex. معرفتش and ماعرفتش are varying spellings of “I did not know”), introduce a variety of new words (ex. محد means “no one”), or make different lexical choices for concepts (ex. كويس and باهي mean “good”).

Dialects introduce morphological variations with different preﬁxes and sufﬁxes. For example, Egyptian and Levantine tend to insert the letter ب (sounds like “ba”) before verbs in present tense.

QCRI NER

Most work on NER relies on using a sequence labeler, such as a Conditional Random Fields (CRF) labeler, that relies on a variety of contextual features and gazetteers, which are large lists of named entities. Our state-of-the-art NER system enhances on the same path by presenting novel ways of building larger gazetteers, applying domain adaptation, using semi-supervised training, performing transliteration mining, and employing cross-lingual English-Arabic resources such as Wikipedia. We train a CRF sequence labeler with these enhancements.

Using Arabic Wikipedia

Since building larger gazetteers can positively impact NER, we used Wikipedia to build large gazetteers. To do so, we filtered category names to filter Wikipedia titles that would constitute names of persons, locations, and organizations. Here are sample words (translated into English) that we used for filtering:

We also used page redirects (alternative page names) to expand the gazetteers. The resultant gazetteer had 70,908 locations, 26,391 organizations, and 81,880 persons.

English DBpedia:

DBpedia is a large collaboratively-built knowledge base in which structured information is extracted from Wikipedia, and it contains 6,157,591 Wikipedia titles belonging to 296 types. Types vary in granularity with each Wikipedia title having one or more type. For example, NASA is assigned the following types: Agent, Organization, and Government Agency. In all, DBpedia includes the names of 764k persons, 573k locations, and 192k organizations. Of the Arabic Wikipedia titles, 254,145 have Wikipedia cross-lingual links to English Wikipedia, and of those English Wikipedia titles, 185,531 have entries in DBpedia. We used the DBpedia types as features for the NER system.

Cross-Lingual Capitalization:

As I mentioned earlier, Arabic lacks capitalization and Arabic names are often common Arabic words. For example, the Arabic name “Hasan” means good. To capture cross-lingual capitalization, we used a machine translation phrase table that was built using large amounts of parallel Arabic-English text and where the case was not folded on the English side. Then given an Arabic word, we would look up its English translation and observe the likelihood that the English translation is capitalized.

Cross-Lingual Transliteration:

Many named entities, particularly persons and locations, are often transliterated. We would lookup the translations of Arabic words in the aforementioned phrase table and then we determined using an in-house transliteration miner whether the English and Arabic translations are also transliterations or not. If they are, then we used the transliteration probability as a feature.

Using Domain Adaptation:

Aside from tagging microblog text with named entities, we mixed tagged news texts with tagged microblog text to make use of the large news training data.

Semi-Supervised Training:

Basically, we used our best NER system to tag a large corpus of microblogs. Our intuition was that if we automatically tag a large set of tweets, then a NE may be tagged correctly multiple times. Then, automatically identiﬁed NE’s can then be used as a “new gazetteer.”

How Good is our NER System

The QCRI NER system is considered state-of-the-art for Arabic microblogs. Table 1 reports on the evaluation results for the NER system. We performed the evaluation on a set of 1,423 tweets containing nearly 26k tokens. The tweets were randomly selected from the period of Nov. 23-27, 2011.