Two caveats. As mentioned in the last post, since FastText is a word embedding, word sense disambiguation is not taken into account. And so Quezon City, the largest city in Metro Manila, is not properly represented in the model since the quezon token is an amalgam of the Quezon province, Quezon city, and the president Manuel Quezon. Second, bigrams are not represented in the unigram FastText model. The city San Juan is a bigram, making it non-existent in the model. Because of these issues, Quezon City and San Juan are unfortunately not included in the analysis.

What’s the goal of this analysis? A FastText model contains knowledge implicitly contained in the corpus it was trained on. In the case of the pretrained model, the corpus is Tagalog wikipedia. By testing some obvious associations, we can check whether the knowledge representation in the embedding model is satisfactory or not. Testing non-obvious associations, we can detect implicit biases contained in the input corpus.

Let’s start with something obvious.

Paliparan (Airport)

As expected, the south part of Metro Manila lights up since this is where the airport is located. The peak of the association is Parañaque (followed by Pasay), which is where the NAIA terminals can be found.

River

For river, Marikina, Pasig and Mandaluyong all light up. The first two have rivers associated with them (Marikina River and Pasig River). Mandaluyong is neighbors with Pasig and is also bordered by the river.

Trapiko (Traffic)

Interestingly, Pasay is most associated with traffic. But this is not such a big surprise when we think about all the recent developments in the city (Resorts World, casinos) and also the proximity to the airports. Makati and Taguig, which I thought would have the highest association with traffic, aren’t highly associated based on the model.

Sosyal (Classy*)

I put an asterisk on the translation because there is really no direct translation for sosyal. Classy as sosyal is somewhat debatable. Somewhat expectedly, Makati and Muntinlupa are peak overall. Makati has Forbes and high-end shopping malls, while Muntinlupa has Alabang. All of these are highly associated with the sosyal subculture.

Urbano (Urban)

For urban, Makati, Muntinlupa and Marikina are in the highest quantile. Makati in particular is expected since Makati is a central business district. Muntinlupa and Marikina’s appearance at the top are quite surprising since these two cities aren’t what I would put on the top of the list of most urbanized. I would give the mantle to Taguig (for Bonifacio Global City) or Mandaluyong (for Ortigas).

Final Thoughts

What we’re probing in this analysis is the implicit associations contained in the Tagalog Wikipedia corpus. What would be more interesting is to capture biases in the minds of actual Filipinos.

How can we carry this out? By training a model on text from actual Filipinos. There are a myriad of sources: Reddit, online forums, Facebook. That’s the subject of the next article in this series.

In the last article, we converted movie dialog into numbers using a bag-of-words approach. Here we go in another direction and use an embedding model to represent text in a dense manner. So instead of capturing stylistic tendencies of Jesse and Celine (via word choice in a bag-of-words model), we will try to capture the semantic differences between their words (via an embedding model).

Word Embeddings

A word embedding model converts words into vectors. By training a neural network model to predict context from a target word (or vice versa) over a large corpus, the model implicitly returns embedding vectors that retain the semantics of the tokens in the corpus. One interesting thing about the model is that it is able to learn analogies. The most famous equation “man” + “queen” – “woman” = “king” can be recovered using vector algebra on the embedding. For some background on word embeddings, check out my other post on Geospatial Associations with the Tagalog Embedding.

For this study, we will be using the FastText English model. As the model is extremely large, we clip the model to only include the top 100K words, resulting in a smaller memory footprint.

Usual state-of-the-art models in sequence/text classification use complex and computationally taxing models like recurrent neural networks (RNN). Iyyer was able to show that a simple neural network that takes the averaged embedding vector of the input yields competitive performance. Note that the DAN does not take into consideration the ordering of words in the input text (due to the averaging procedure) while RNN-based ones take word order into account.

For the dataset, we will be using the same one as in the last study: Before Sunrise and Before Sunset dialog. There are two columns in the input file: a text column containing the dialog and a label column containing either ‘Jesse’ or ‘Celine’. We will be performing a different filtering procedure on the text as compared to the last study. After tokenizing the text with the NLTK tokenizer and removing stop words, we will retain only dialog that has 3 or more tokens. We do this since short text are often just retorts or remarks. We are mostly interested in “sense” since we’re considering a semantic model and shorter text (like “yup” or “great”) oftentimes have little to no semantic content. The dialog representation is then obtained as the average vector of the embeddings of the tokens, following the Deep Averaging Network model. In total, 791 lines of dialog constitute our dataset.

For this analysis, we try out different model architectures for the neural network. We perform nested cross validation to test out models with 1 to 5 inner layers. We split the dataset 80-20, with 80% being train and validation and 20% being test. From the 80% we perform stratified 4-fold cross-validation to test out different architectures.

Our evaluation metric is accuracy. Note that the two classes (Jesse’s dialog, Celine’s dialog) are almost evenly split, with a 51:49 class imbalance.

Model Results

The accuracy of the model for different numbers of (inner) hidden layers are shown below. As we can see, accuracy is highest (~60%) for two hidden layers. Following standard model selection protocol, we would pick this as our final model. Evaluating the model on the test set, we see about 55% accuracy for two layers. Three hidden layers seem to perform the best on the test set.

Considering the class imbalance 51:49, we can confidently say that our model at least does better than random guessing.

As hypothesized in the last article, the model doesn’t really perform that well compared to the bag of words approach. Style over semantics seems to be the predominant difference between Jesse’s words and Celine’s. However, one must note that the comparison between this study and the last is not one that is completely fair. For one, the training and evaluation datasets aren’t the same since we implemented a different filtering procedure to come up with the training and test sets.

Conclusion

From first impressions, it seems that a stylistic model (based on actual words used, ala bag of words) trumps a semantic model (based on meaning, ala embeddings). It must be noted that these two models (bag of words, embeddings) contain different aspects of the dialog and might not be correlated with one another. It would be interesting to see how a combined model would improve on either model and see how big a jump in accuracy it would produce. That would be the subject of the next article in the “Jesse or Celine?” series.

In today’s day and age, dominated by technology and mass consumerism, data is almost literally everywhere.

Just to give you an idea of the amount of data out there, IBM reports that in 2012, 2.5 billion GBs of data were generated on average every day. And that was 7 years ago, which in technology years is practically a whole era. According to Forbes, at this wild, constantly increasing pace of Data growth, by 2020, there will be 1.7 megabytes of new information generated every second per each human being.

And it’s not just the amount of data that’s out there, but what it stands for – money, lots of it.

This is why JCU Online has compiled some essential data on the three main fields that together make up the professional Data landscape – Data Science, Big Data, and Data Analytics, so you can decide if you’d be interested in pursuing a career in any of them and how you’d approach it.

The term “science” is more than befitting. Data Science is the most refined field of the three and requires the tightest niche expertise to process and sift through raw data in order to identify highly specific and valuable insight amidst all the background noise. Data Science is concerned with highly individual data.

In short, Data Science encompasses all the complex processes, tools, techniques, and mechanisms that go into the cleansing, analysis, and preparation of data, which ultimately results in clear, pinpoint estimates and highly proactive directions that help businesses drive growth.

This is perhaps the most important and ubiquitous application of Data Science, which ironically, tends to be the most inconspicuous as well. Have you ever wondered how Google can produce a plethora of accurate results of a search in the matter of milliseconds? Through Data Science.

Digital Advertisements

This is another big one. You’ve probably noticed and perhaps even feel watched when a weirdly relevant to you digital ad pops up in your feed or search screen. Something that specific can’t be a coincidence, right? Indeed, it isn’t – it’s the result of the algorithms that are Data Science’s bread and butter. Digital marketing altogether relies heavily on Data Science, which is why it’s become so successful over the last few years.

Recommender Systems

Recommendations are a huge part of the user experience, especially in fields like e-commerce. Those recommendations aren’t just some half-random, rounded up suggestions – they are strategically based on your search history and demands and have been selected amidst a billion other options.

Big Data

While Data Science drives business growth by focusing on one individual at a time, Big Data is concerned with the masses. It deals with unimaginably large data that certain businesses drown in on a daily basis. If it wasn’t for Big Data, all this information would go unprocessed and unutilized, resulting in loss of insight and opportunities.

Big Data examines both structured and unstructured information, the first coming from transaction data, Relational Database Management Systems, etc., while the latter is derived from emails, blogs, social media activity, etc. There’s also semi-structured data that is obtained from the likes of text files and system logs.Image Credits: Tumisu/Pixabay

Applications of Big DataFinancial Services

All institutions that provide financial services deal with endless amounts of data. They employ Big Data to create a structure to the chaos and identify patterns in key areas such as customer, compliance, fraud, and operational analytics.

Communications

The communication industry is booming, and the competition for subscribers – whether it’s about gaining or retaining them and/or upgrading their worth, is fierce. The key to success is streamlining the analysis of all this customer and machine-generated data and pinpointing strategies that work on large scales.

Retail

Again, brick and mortar stores can focus on every single individual too much as it wouldn’t be cost-efficient. Instead, it needs to pin down more universal strategies that would draw in large groups of consumers. Big Data helps retail businesses stay on top of all the data that comes from from customer transactions, social media, loyalty programs, etc.

Data Analytics

Between the three, data analytics is probably the most trivial, but utilizing it to its full potential can tremendously increase profits.

Data Analytics employs algorithmic and mechanical processes to juxtapose a number of different data sets in order to spot connections and patterns, drawing conclusions on consumer behavior and companies’ efficiency.Image Credits: CCO Public Domain/pxhere

Applications of Data AnalyticsHealthcare

Healthcare institutions are usually torn between providing quality care and providing time and cost-efficient care.. Data Analytics help establishments gain a more in-depth understanding of patient flow, equipment use and treatment from a business standpoint in order to help them optimize their operation and reach a happy medium.

Travel

The user and buying experience determine the winning and losing companies in the travel industry. Through Data Analytics, companies can delve into consumer psychology and preferences and exploit them.

Energy Management

This is another field where Data Analytics is a game-changer, helping companies optimize their energy use and distribution, achieve greater automation, and respectfully cut down operational costs.

The Data business is showing no signs of slowing down, and with a career in any one of these three fields, you’re equipped to ride its momentum in full force.

NLP has progressed tremendously over the past few years. One of the most exciting advancements in the field is the concept of word embeddings. When one deals with text in a machine learning setting, he has to find a way to turn words into numbers, since machine learning models are only able to take numbers as inputs. Before word embeddings came along, the standard way to vectorize text was to one-hot encode them, which meant adding one dimension for each word in the vocabulary of the input text and then setting 1 to the dimension if the input text contains the word and 0 otherwise. Given a very large set of text, this implies a very high dimensional numeric representation of text data. This in turn leads to a plethora of problems: overfitting, curse of dimensionality, but most importantly (from a semantic perspective) that the semantic interrelatedness of words is not carried over. In other words, the one-hot representation doesn’t know that cat and kitten are semantically related, nor does it know that man and woman are related.

Enter word embeddings. The one-hot representation of text is a type of “sparse” representation, since the resulting representation has a lot of zeros. Word embeddings on the other hand are a type of “dense” representation, which means that the representation has dimension values that are continuous.

Remember in a one-hot representation that each dimension pertains to a word in the input text vocabulary? In a word embedding, each dimension is an implicit concept learned from training. One dimension could be “masculinity” or it could be “cuteness”. Usually the interpretation of the dimensions isn’t really that big a deal. What is important is that semantically related words are close to one another in the vector space, and we can use linear algebra to query the model.

How do we train word embeddings? I don’t want to go over the details of this since there are lots of awesome resources on the net, but the gist is that we feed a massive corpus of text (like Wikipedia) into a neural network where we scan over the whole corpus and we predict the context of each word from each target word, or vice versa, using a neural network. By doing so, we learn the statistics of each word in the corpus and pick up on the meaning of each word from its neighbors. The resulting output of training is a vector space in which words are embedded. Standard vector algebra is valid in this space, and so we can recover the famous analogy king + woman – man = queen.

Associations

Now, the focus of this article is not an introduction on the theory of word embeddings but on what we can extract from them: associations. Word embeddings are really cool since we can implicitly study associations by checking the similarity of two words in the model. By association, there remains one important question: whose association? This is where psychology intersects with natural language processing. Remember that word embeddings are trained on a large input corpus? An embedding model captures associations and implicit biases from the text, which implies that it correlates with two audiences: the writers and the readers. Naturally, the writers will impart his own associations on the things he writes — so a pro-Trump writer would probably associate Trump with some positive aspects. Looking at it from the other side, the reader is the consumer of the text and consequently will be influenced by what he reads.

For associations, we will use the standard measure: cosine similarity. Cosine similarity measures the angle between two vectors. In an embedding model, meaning is represented in terms of direction, and so the cosine similarity is a measure of the relative direction (i.e. the angle) between two vectors. The score varies from -1 to 1, wherein -1 implies an opposite sense (since the two vectors are antiparallel) and 1 implies complete overlap in meaning (since the two vectors are perfectly aligned).

FastText is one of the new embedding models in the wild. It builds on the Word2vec model but instead of looking at the words in the input text, it looks at n-grams, the building blocks of words. FastText is owned by Facebook and they were cool enough to build models for a ton of languages, one of which is Tagalog. FastText models are built on CommonCrawl and Wikipedia. In this article we will probe the implicit associations captured by the Tagalog model.

What are the geographic associations implicit in the Tagalog model? We look at associations on the provincial level in this article and then zoom in to city level for Metro Manila on the next Will the model recover the fact that the North is Marcos territory? That in Mindanao there are a lot of Muslims? That Palawan is vacation territory?

Technicalities

There are 82 provinces in the Philippines, and not all of them exist as tokens in the Tagalog model. This puts into light some of the technical problems at the forefront of NLP research: polysemy and multi-word embeddings.

Polysemy is the coexistence of many possible meanings of a word. For example, Quezon as a token could refer to the province, the city in Metro Manila, or the president Manuel Quezon. However, since an embedding model is a function from words to vectors, it can only return one vector — all meanings are squished together into one vector which muddles the overall sense. The easiest way to disambiguate the different senses of Quezon is to preprocess mentions of Quezon in the input corpus into Quezon_province, Quezon_city, and Quezon_president before feeding into the training algorithm. But since we are using a pre-trained unigram model, we really can’t do anything about polysemy.

The Tagalog FastText model is a unigram embedding, so only words are mapped into vectors. This means that multi-word concepts and places like New York and Agusan del Norte aren’t represented in the model. For provinces like Agusan del Norte and Lanao del Sur, we take the easy way out and represent them as the base words Agusan and Lanao, respectively.

For this study we will be combining two awesome packages in Python: the NLP package Gensim and the geospatial package Geopandas. We will be loading and querying the Tagalog Fasttext model with the first and cutting and plotting the Philippine shapefile with the second. What we get out of this study are choropleth maps showing the degree of association each province has with an attribute of interest.

Let’s look at something basic first.

Bisaya (Visayan)

Querying the model for Bisaya, we can see that the map activates mostly in Central and Southern Philippines (Visayas and Mindanao). We also see some activation up north, but majority is in the south, where Bisaya is the predominant group and language.

Marcos

If we check for provincial associations for Marcos, we see that the North (Ilocos area) activates. This is where Marcos is from and where most of his ardent supporters can be located.

MILF

For MILF (Moro Islamic Liberation Front), Maguindanao and some other Southern provinces activate.

Igorot

For Igorot, an indigenous tribe up North, Ifugao and the neighbors activate.

Turista (Tourist)

If we query for tourist, we recover the fact that Palawan is a tourist spot. But we get nontrivial matches for Cagayan and Surigao.

Kaibigan (Friend)

For something abstract like friend, we see that Cebu takes the top spot. Is Cebuano hospitality and friendship embedded in the model? Basilan and Rizal also have high associations.

Kaaway (Enemy)

Looking at enemy, we see that Bataan and Cavite are top matches. Thinking about it, Bataan and Cavite were major landmarks during the wartime period. That might have led to high association with enemy.

Ganda (Beauty)

For beauty we get matches in Palawan (beautiful beaches, probably), Kalinga and Sultan Kudarat. These are quite nontrivial matches.

Conclusion

As we can see, the Tagalog FastText model contains some trivial associations that we might expect (like Marcos up north and MILF down south), but we also recover some implicit and abstract associations that aren’t really trivial to explain like Cebu with friend.

As I mentioned in the first section, the model captures associations of the producers and consumers of the input text corpus. As the FastText model was mostly built on Wikipedia data, it only captures associations on the general level. By building a model on social media or forum text, for example, we can capture what’s in the thought process of the modern Filipino on the web. By building a model on Filipino news sites, we can capture and measure the implicit biases of the media.

In the next post, I drill down on Metro Manila and probe city-level associations in the FastText model.

Artificial intelligence (AI) has been a buzzword in the industry for the past few years. However, the field of AI actually dates back to 1943, when Alan Turing posed his famous “Turing Test” to see the limits of computational cognition.

Techjury recently released an infographic detailing the history of AI from 1943 up to 2014, during which the Turing Test came full circle as the chatbot Eugene Goostman became the first ever machine to pass the test.

You might be aware that your Bitcoin or other cryptocurrency transactions have a possible taxable impact. However, you might not know exactly how to report them. Read on for more guidance on cryptocurrency taxes.

What is Bitcoin?

Bitcoin is a worldwide payment system where users buy virtual currency using an exchange. Bitcoin is stored in a digital wallet and can be transferred using a mobile app. No bank or other intermediary institution is involved.

Bitcoin can be used as a digital currency to send or receive funds, pay for goods or services, or simply for investment. Transactions are anonymous and are tracked only via the digital wallet identifiers on a public ledger. Originally used by illicit operators, mainstream companies now accept Bitcoin as payment.

Enter the IRS

The Internal Revenue Service (IRS) isn’t blind to Bitcoin and provided guidance about “convertible virtual currency” in its Notice 2014-21.

The IRS defined convertible virtual currency as virtual currency that has an equal value in real currency, or that is a substitute for real currency.

The IRS specifically referred to Bitcoin as a type of convertible virtual currency that can be digitally traded. In addition, you can buy or exchange virtual convertible currencies into U.S. dollars or other real or virtual currencies. However, virtual currency itself does not have legal tender status in the U.S.

Tax Liability

“Though cryptocurrencies seem like a brand-fangled new investment, one with which our, by comparison, antiquated tax system can’t compete, they are actually taxed like pretty much any other mundane item,” said Mark Durrenberger, author of The Modern Day Millionaire.

If you sell, exchange, or use convertible virtual currency to pay for goods or services, you might have a tax liability. For tax purposes, the IRS treats convertible virtual currencies as property. If you receive Bitcoin as payment for goods or services you provide, then when you compute your gross income, you must include the fair market value of Bitcoin in U.S. dollars as of the date you received the Bitcoins.

Durrenberger gave the following example:

“If you buy Bitcoin for $100, and later sell it for, say, $1,000, you would owe capital gains taxes on that $900 gain. If you held that Bitcoin for less than one year, the tax rate would be whatever rate you pay on your regular income. If you held it for longer than one year before you sold, you are taxed at the more favorable (i.e., lower) long-term capital gains rates,” Durrenberger said.

Fair Market Value

How would you determine the fair market value of Bitcoin? “It can get a bit tricky as the value of Bitcoin jumps and dips constantly and those changes can be quite drastic at times,” said David Hryck, a tax lawyer and partner at Reed Smith in New York City. “You will have to convert the Bitcoin value to U.S. dollars as of the date each payment is made.”

In this world of anonymous payments, recordkeeping of your transactions might be a challenge. “Make sure you keep careful records of the dates and value,” Hryck said.

Independent Contractors

If a company or individual pays you in Bitcoins for services you performed as an independent contractor, you might wonder if it constitutes self-employment income.

According to the IRS, self-employment income includes all gross income from any trade or business you engage in, other than as an employee. The fair market value of Bitcoins you receive for your services (measured in U.S. dollars as of the date you receive payment) is self-employment income and consequently is subject to self-employment tax.

Reporting to the IRS

You might wonder how to report your Bitcoin or other cryptocurrency transactions on your annual tax return.

The basic tax rules that are applicable to property transactions apply to transactions using virtual currency. The IRS has made it clear that Bitcoin is a type of property and your transactions must be reported.

You should file Form 8949, Sales and Other Dispositions of Capital Assets and Schedule D (1040), Capital Gains and Losses, with your annual tax return to reflect your cryptocurrency transactions.

Failure to Report

What will happen if you skip reporting your Bitcoin or other digital currency transactions on your tax returns? Will the IRS know?

The fact that in 2014 the IRS issued a comprehensive notice including a Q&A section shows that the IRS is well aware that Bitcoin and other cryptocurrency transactions are more than a passing fad. As with any tax law or IRS rules, you assume certain risks if you fail to comply.

The commercial real estate (CRE) industry is comprised of many different types of service providers, including property management, brokerage firms, banks, and other types of lenders. When a CRE transaction takes place, there are various operators involved, requiring extensive sharing of official property documents, and financial information which need to be validated. The requirements for validating all information across all parties slows down the speed of each transaction, which can take weeks and months to complete. Many CRE firms have turned to blockchain to speed up execution times, decrease error and increase transparency in each transaction.

What Is Blockchain?

Blockchain technology is a way to store and transfer information in an encrypted manner by distributing data instead of copying it in a central location. Blockchain does so through a cloud, peer-to-peer network that eliminates the need for a third party, which ultimately reduces transaction fees. A digital ledger is then created and updated with each financial transaction in blocks.

There are plenty of benefits to making transactions and transferring data using blockchain as the technology is not controlled by one central entity, such as a central bank. This means that breaching these blocks is extremely difficult, maintaining the sanctity and transparency of its transactions and data.

Blockchain is the backbone of cryptocurrencies such as Bitcoin, which offer speedy and low-cost ways of sending and receiving money.

Faster Transactions

One of the most exciting ways blockchain is disrupting the CRE world is in the form of smart contracts. The industry currently relies on an inefficient system of old-school verification of property ownership by conducting research to ensure the property belongs to the party who is selling it.

Blockchain can reduce the speed in which the chain of custody regarding CRE properties takes place as a property’s title would be stored on a public ledger. This would remove the need for another central repository, thus reducing transaction, state, city and legal costs. The same principle would apply for leases that would be recorded via blockchain.

More Transparent Deals

Blockchain can also ensure that real estates assets are more liquid and the terms of the agreement are fully understood by both sides as every piece of data regarding a property would be stored publicly. This includes data surrounding former owners, construction done on the property, past maintenance costs and records regarding former inspections.

Having all this information available would give the investor a more comprehensive idea of the property they are investing in. Blockchain essentially ensures that everyone is on the same page and both sides are fully aware of what they’re getting into as every piece of information is out there for anyone to access.

Digital Paper Trail

Another challenge with the CRE industry is the fact that public records can be outdated, unreliable or not available. Following a property’s paper trail can be time-consuming and frustrating as a lot of this information is lost due to poor organizational skills from industry workers and legacy systems that lose data when updated.

With blockchain, every piece of information on a property would be available in the same place rather than in multiple physical and digital domains. Blockchain would also help to eliminate the type of fraud that sometimes exist in the industry as deeds and titles can be counterfeited easily.

Buying Property With Cryptocurrencies

As previously mentioned, Bitcoin is a cryptocurrency that relies on blockchain to complete financial transactions online in a matter of seconds. Some investors and real estate firms have started adding Bitcoin to the industry, including Ivan Pacheco, who bought a two-bedroom condominium in Florida for $275,000 in Bitcoin.

In the residential space, you can buy a condo on the Lower East Side of Manhattan with Bitcoin. Meanwhile, some apartments in New York City are allowing their tenants to pay for rent using Bitcoin. Cryptocurrencies have been historically volatile and they’ve been on the decline since peaking in December 2017, but some investors believe that the future of real estate will be closely tied with Bitcoin and other digital coins.

Nevertheless, blockchain’s role in the CRE industry is becoming more prevalent each day. The technology’s potential to speed up transactions with smart contracts, its ability to add transparency to a deed or title and the fact that it dramatically decreases the chance for fraud suggest that more investors will flocks towards firms that use blockchain for CRE transactions.

The law has always been part and parcel of the human experience. Ever since ancient times, laws have been established to help guide and regulate human behavior. As humanity progresses, the laws change as well. With every development that we encounter and create, laws are adjusted to continue being relevant, and new circumstances necessitate laws to govern it. The recent accelerated technological advances had once again challenged the law’s adaptability and required the development of legal technology.

Legal technology is the use of technology and software to provide legal services. As of now, legal technology companies are composed of mostly startups that aim to disrupt the traditionally conservative legal market. These legal technologies want to give law firms and lawyers new ways to serve and provide services to their clients, something that has hardly changed for the last few decades. Being able to adapt to technological trends is beneficial to a law firm and lawyers, especially since their clients are becoming more and more tech-savvy as well.

Knowing that a lawyer is up to date with all the current technological and digital trend can be a significant encouragement for a client to hire a lawyer. It shows that their lawyer is committed to doing their best to represent their clients with all the assets that are available to them. Aside from getting more clients, here are few benefits of using legal technology:

1. Automation. – There is various software available to law firms which automates multiple areas of the business that needs management. Setting up scheduled meetings, tasks, document management, and even legal time and billing can be automated and help reduce the time that lawyers spend on these. Aside from freeing time, it can also reduce the cost that a law firm would pay if they hire an assistant or office manager to do these admin tasks.

2. Better customer experience. – Now more than ever, it’s critical for a law firm or lawyer to keep their clients happy. Even if you provide the best legal work in your area, if your customers are not satisfied with the overall service that they received, you won’t be hearing from them again. Customer satisfaction should be a top priority for lawyers nowadays, and legal technology can make this easier for you. Software that specializes in e-mail marketing helps you reach out to your client and keep them engaged. A live chat feature on your firm’s website can also help increase customer satisfaction and translate website visitors to clients.

3. Ease of research. – Although lawyers have adapted to the Internet and have added it to their research method, many lawyers still use print products regularly. Some of these print products can take a while to be updated, and the law always changes so it’s best to use an online legal research platform that could help lawyers stay up to date with the law and could provide you with tools that are otherwise unavailable with print products.

4. Adapting to change. – Federal courts have utilized electronic court filing for years now, and a few states are starting to use it too. There may come a time when electronic court filing is mandatory and to help ease into this transition, lawyers and law firms should start utilizing litigation support software. Litigation support software aims to assist lawyers in the process of litigation and document review.

Since its development in 2009, cryptocurrency has been in the financial space as both a threat and an innovation to the business and economic scene. Budget investors have been swayed by the virtual monetary device that offers anonymity, easy international transactions, and feasibility as an investment instrument.

Its familiarity has bred numerous investors in the market. Their rising number has now been converted to the increased value of crypto coins and the addition of shops that credit virtual currency as payment.

Top Cryptocurrencies

Websites such as CoinMarketCap track cryptocurrencies that are hitting the market and show their current value in dollars. Among the top cryptocurrencies are Bitcoin, Ethereum, and Litecoin.

Bitcoin (BTC). It remains to be the most popular form of cryptocurrencies. Bitcoin’s decentralized nature paved the way for more cryptocurrencies to enter the market. It continues to be on top of the list of the best cryptocurrencies, not only because of its pioneer identity but also because of its increasing market cap in the virtual financial world.

Ethereum (ETC). Bitcoin’s second closest cryptocurrency competitor, Ethereum, prides itself in the processing of smart contracts. This cryptocurrency started out as a tool to monetize applications in the Ethereum network. Budget investors are urged to look into its ability to allow the creation of distributed applications without interference from another party. The ETC is also popular among initial coin offerings (ICOs), an aid for startup crypto junkies.

Litecoin (LTC). Litecoin is often considered as Bitcoin’s clone. Familiarity is one of the assets that Litecoin has to offer to its investors since it is one of the oldest cryptocurrencies in the market. Since 2011, its fast transaction speed and close connection to Bitcoin continue to be its premium quality.

Cheapest Cryptocurrencies

For budget investors, here are a few of the cheapest cryptocurrencies in the market now:

Bitshare (BTS), currently trading at $0.086510, with an all-time high of $0.40.

Lykke (LKK), trading at $0.36, with an expected price of $1.50 to $2.3.

Verge (XVG), recommended for long-term portfolio addition as it trades at $0.006560.

Digibyte (DGB), trading at $0.008941 with its highest point being $0.06

SiaCoin (SC), trading at $0.000046

Protection from scams and fraud

US regulators have started to find ways to address the irregularities that surround cryptocurrencies and protect the public from scams and fraudulent activities.

Investors themselves must also take necessary precautions before investing in cryptocurrency. For starters, investors should research the concept of the blockchain, which serves as the facilitator for the financial transactions involving cryptocurrency. Transactions could revolve around financial contracts, real estate deeds, personal identification, bank transfers, and also insurance. After doing the necessary research on the blockchain, investors should also be mindful of ICOs. This type of networked funding, which is usually done to gather capital for startup companies, often turns out as fraudulent. Investors should take the time to know where they put their coins as one of the cryptocurrency’s disadvantages is its confusing nature. Its popularity often sways newbies into thinking that unrealistic amounts of money can be obtained in just a short investment span.

I’ll keep this short and sweet. I did an analysis of The Killers using some NLP techniques in my last blog post. I scraped both The Killers and Lana Del Rey lyrics previously for some topic model analysis, but it wasn’t so successful. To not waste my scraped data, here are the same results for Lana Del Rey. For details on methodology, you can go check that out.

A preamble. I find Lana Del Rey to be such an intriguing artist. She has this very cinematic and old-fashioned quality to her, but she manages to sound fresh and contemporary at the same time. I enjoyed her Ultraviolence album, which I consider her best work, and disliked Honeymoon, which I consider to be quite boring.

Word Frequency Analysis

It’s quite funny, but not entirely surprising, to see that love, life and baby appear in the top words across all albums. Well, Lana Del Rey is a torch singer, so it’s to be expected.

Lyrics Embedding Analysis

Brite Lights, Sad Girl, Flipside, Money Power Glory appear to be in the periphery, and can be considered to be the most unusual songs, semantically speaking. Looking per album, songs from Born to Die seems to be the closest ones overall, while AKA Lizzy Grant and Ultraviolence are all over the place.

Overall, it’s quite difficult to interpret the embedding results, but it could be due to the fact that we’re using a very crude way of turning word vectors into song vectors, and so a lot of information is lost. A Part II would focus more attention on this bottleneck.