How Blockchain Applications Could Improve Your Business or Startup

So, what’s all the fuss? How big of a deal is blockchain technology?

At this point, it’s important to note that we’re not talking about Bitcoin.

Cryptocurrency is not blockchain, it is only one application of blockchain technology. So, Bitcoin is a cryptocurrency that uses blockchain technology to work. And yes, as the first application of blockchain technology, there is a lot of hype around the Bitcoin blockchain.

IDC expects worldwide spending on blockchain applications to reach $9.7 billion by 2021. It was expected to reach $2.1 billion in 2018 up from $945 million in 2017. To reach the 2021 figure, spending needs to grow 81.2% at a five-year compound annual rate.

And over the past few years, McKinsey found that venture capital for blockchain startups has grown at a steady pace, hitting $1 billion in 2017. And it’s not just the startups that are investing. Leading players are investing as well.

According to McKinsey, IBM has hired more than 1,000 employees to work on blockchain applications. They also invested $200 million in a blockchain-powered Internet of Things.

In the meantime, the MIT Sloan Management Review says that blockchain could be as “fundamental as the Internet” in shaping the future of business.

Let’s repeat that – as fundamental as the Internet.

Okay, so lots of money? Check. Lots of talk about blockchain being as important as the Internet? Check. So, why isn’t everyone using blockchain technology applications?

Well, the technology happens to be in its infancy. That means it’s going to take quite a bit of innovation and application before blockchain becomes the norm. Regulation and standardization need to happen, not to mention adoption and returns on investment.

So, why should companies look at blockchain solutions now?

Most experts say it’s only a matter of time before blockchain becomes mainstream. It’s better to understand how it works now if you don’t want to risk being left behind.

Step 1: Understanding Blockchain

What are the real benefits of using a blockchain application?

Okay, so the technology is a big deal and you need to take it into consideration. But what is it actually going to do for your business? Who cares if everyone is excited about blockchain if it’s not going to impact your bottom line?

So, here’s a list of a few possible benefits for your company:

Disrupting Business Operations

Speeding up Business Processes

Boosting ROI

Cutting Costs

That’s right. The list contains all the usual suspects that justify any new undertaking. Blockchain technology can refine complicated processes that rely on traceability and visibility. And with the friction gone, you can save time and money.

In the short term, the main benefit is cost cutting.

McKinsey estimates that around 70% of blockchain’s short term value is in cost reduction. Blockchain applications reduce costs by removing the middlemen or the administrative effort of keeping records and managing transactions.

But the most important thing is not to think of blockchain as a technology that impacts your bottom line. Instead, think of it as a way to execute a business model, create a new market, or interact with markets.

Here are some key aspects of blockchain applications that can boost business processes:

The Elimination of Human Error

Secure Verification

Democratization of Trust

Permanent Recording

Enhanced User Privacy

Decentralization (Tamper-proof Ledger)

The Efficiency of Verifications and Transfers

Enhanced Transparency (Open Source Tech)

Now that you know the general benefits of a blockchain, the next step is finding out if a blockchain application will benefit you.

The ideal way to begin is to make a list of current pain points for your company, shareholders, and customers. Which business processes are slow? Which are causing friction? Which are causing the company to bleed money?

Once you’ve identified existing problems, you can investigate if a blockchain application will provide a unique fix. Ask yourself – would the key aspects of a blockchain improve any of my problematic business processes? If yes, it’s time to look more into what that process would look like if it ran on a blockchain.

The most important thing to remember is this:

Before deciding to use a blockchain technology application, you must have a use case.

You don’t want a solution without a problem to solve. Companies that don’t have a clear use case are less likely to reap the benefits of blockchain technology.

So, let’s take a look at what use cases other companies are finding for blockchain applications. A new study by Deloitte asked companies how they plan to use blockchain applications. Here’s what they answered:

Image Source: Deloitte

Because of the hype around cryptocurrencies, it’s often difficult to see the other applications for blockchain technology. We know that blocks can store information about monetary transactions. But they can also store data about any sort of transaction:

Packages (Supply Chain)

Personal Details (Medical Records)

Assets (Real Estate / Luxury Items)

Energy (Power Grids)

Notice that 53% of companies in the Deloitte study plan to use blockchain applications to handle their supply chain. Only 30% plan to use it for payments.

The bottom line? Blockchains can improve a variety of business processes. And implementing a blockchain application will change the way your business runs.

Step 2: Creating a Blockchain Strategy

Does your company have a use case for blockchain applications?

Let’s go deeper into identifying a use case for your business.

To make things simple, you can think of blockchain as a solution to two needs – record keeping and transactions. You can then address these needs with one of six use cases according to a report by McKinsey.

Another way to look at things is to ask yourself if improving any of the following would solve the problems you’ve identified:

Recording

Tracking

Verifying

Aggregating

If you’re shaking your head no, then a blockchain application probably won’t solve that problem.

Additionally, you’ll want to ask yourself the following questions:

Step 3: Launching Your Blockchain Strategy

When should your company implement a blockchain application?

Let’s say you’ve gotten to the point where you’ve decided you definitely have a use case for a blockchain application. What’s next?

Well, it’s time to decide when to take action. And that’s when doubt might hit you. What if it’s all just hype after all? What if no one adopts blockchain in the long run? What if your investment amounts to nothing?

Maybe you should hold off and see how things develop, right?

These are all valid questions. But while cryptocurrencies boom and bust, blockchain remains a safe investment as long as you have a good use for it.

In fact, many experts say that it’s a bigger risk for companies to wait. Ignoring the blockchain trend for any reason is a “dangerous attitude” according to Gartner.

Why?

Let’s say MIT Sloan is right. Blockchain becomes as “fundamental as the Internet.” In that case, businesses who fail to adopt may end up as far behind as those who didn’t go digital.

The good news is that you won’t run the risk of getting left behind if you consider blockchain applications now. And you’re here, so you’re further ahead than most.

The remaining problem is when to start using the technology. And for that, there is no single correct answer. You might decide to jump into the deep end or wait until you find a practical blockchain application that fits your needs.

It’s up to you, your business type, and your budget. Will it be better for you to be an adopter or a disrupter? The answer to that question also has a lot to do with your industry.

On that note, here are some stats about the future of blockchain applications:

In a 2018 survey, PwC said 84% of respondents were already involved with blockchain applications. These are the companies already spending. But how involved are they?

Only 15% of PwC’s companies are using a live blockchain application. More than half are in the research and development stage. Here’s a closer look:

Image Source: PwC

The trick is to start small. Don’t start by putting an entire system on a blockchain. PwC advises companies to start with individual processes. That way you can be sure that everything works.

Yet, PwC named blockchain as one of their “Essential Eight” emerging technologies. They believe that blockchain will be an essential technology in 3-5 years. And that’s true for all businesses across all industries.

There are others that are more conservative about the timeline for blockchain development. For example, HBR believes that we’re in for a long ride. But again, it’s never too early to look at possible applications for blockchain. The trick is to tie the technology to new business models.

The bottom line?

Companies are investing in blockchain applications. And that will continue until blockchain becomes an “essential” technology.

Step 4: Basic Blockchain Technical Details

Which architectural components should you consider?

To help you answer the “when” question, it’s also a good idea to consider which architectural components will benefit your business most.

Do you need to build an in-house, private blockchain? Or can you tap into a public one? Or perhaps you need to join a blockchain with lots of other companies?

What type of blockchain will work best for you and your company? First, let’s look at the three types of blockchains:

Public

Private

Consortium

Public blockchain examples include Bitcoin and Ethereum. Anyone can join the blockchain as a validator and send transactions.

Private blockchains are invite only. You might want to consider a private blockchain if you want to keep records of sensitive data or handle in-house accounting. You have the control because your blockchain is autonomous from the public Internet.

A consortium blockchain application comprises several companies running as nodes on the blockchain. You still need permission to join, but no single company controls the blockchain. Instead, administrators decide who can see what and who can execute consensus protocols.

Once you’ve decided which type of blockchain is best, you’ll need to research which blockchain code will fit your needs. Blockchain code differs depending on what you need to do. So, there is no one-size-fits-all solution when it comes to setting up a blockchain for your business.

Image Source: McKinsey & Company

Pro Tip: A good example of a consortium is R3. Comprising more than 70 global banks working together, R3 has developed the Corda blockchain platform. By working together, consortiums can create platforms that standardized blockchain applications.

Step 5: Implementing Your Blockchain Strategy

How would you go about implementing blockchain architecture?

After extensive research, you’ll want to invest in building blockchain architecture.

Without going into blockchain technical details, there are a couple of approaches you could consider. The first is to invest in people who know what they’re doing.

For the most part, blockchain code is open source and organization led. You can find it on Github and use it as the backbone of your projects. So, a simple approach would be to hire blockchain programmers for your project and build a solution in-house.

A second approach is to join an existing blockchain network like Quorum or Ethereum. Quorum is JP Morgan’s distributed ledger and smart contract platform. Built on Ethereum, Quorum is an “enterprise-focused” solution. Ethereum allows for a more individual, scaled approach.

Do make sure that any enterprise solution you choose has the blockchain components you need. There are five components of blockchain: encryption, immutability, distribution, decentralization, and tokenization.

A third solution is to outsource your project to a company that specializes in blockchain and can build a custom solution for you.

Your choice will rest on a few key aspects of your project:

Do you want your blockchain application to be public, private, or part of a consortium?

Do you want control over building a blockchain for your company?

Do you have the budget and resources to hire a team or outsource your project?

Each solution has pros and cons, and your choice should reflect the individual needs of your business and your blockchain use case.

If you would benefit most from joining a consortium, you might have to wait longer to implement a blockchain until others are willing to join you. If you’re building an in-house solution your timeline is obviously up to you.

Pro Tip: If you need more help, there are templates for designing business use cases online. You can also use the tool to make spreadsheets of business processes. Add those that will speed up with the removal of a middleman or the creation of smart contracts.

Blockchain Applications Explained – What is a Blockchain and How Does It Work?

Okay, so you know how a blockchain application could help your business. And you know the key aspects of a blockchain. But it’s also good to understand how blockchain works.

The good news – as a concept, blockchain is not impossible to understand.

Let’s take two different blockchain applications as examples – Bitcoin and Ethereum.

Bitcoin is a cryptocurrency blockchain application and Ethereum is a smart contract blockchain application with tokenization. Both are blockchain-based applications, but they have different structures, outcomes, and purposes.

That’s why it’s important to understand blockchain basics. You’ll have a better grasp on how to leverage the technology along with a broader idea of its possible applications.

What is a blockchain?

A blockchain is a form of distributed ledger technology. You can program it to become a growing record of anything of value. It has two parts – the block and the chain. The “chain” is a public database of blocks. A “block” is a chain link that contains information about transactions.

Let’s go into more detail.

A ledger is nothing more than a record of data or transactions. Ledgers date back to ancient times when they were nothing more than clay tablets that read:

“Joe bought two goats from Roger for 60 gold coins.” OR “Joe owns two goats.”

A distributed digital ledger is a shared record spread across a network (distributed). That means there are many copies of the ledger spread out across many computers. Imagine hundreds of copies of the same clay tablets – but on computers.

The system synchronizes the copies so that each one updates with each new transaction.

Because it is a distributed ledger, it is also decentralized as it has no single storage place. Also, there is no single ledger owner – like a bank or corporation.

Nodes – or computers in the network – verify the transactions. After, the blockchain ledgers update.

So, to repeat:

A blockchain is a distributed, decentralized ledger, allowing trade without a middleman. The blocks are individual records of data (ledger entries). The chain is the complete record of all the blocks (the full ledger).

What kind of information is stored on blocks?

It’s important to note that blocks can contain any type of data you need. Transactions do not have to be monetary. You can trade anything of value on a blockchain from data and storage to energy and assets.

In general, blocks contain four types of information:

Transaction Data (e.g., Date, Timestamp, Dollar Amount)

Personal Details of Transaction Participants

A Unique ID Code for the Block (Hash)

A Cryptographic Hash of the Previous Block

Let’s say you’re buying a house on a site that uses a blockchain application. The block would record the date, time, amount paid, and legal details. That’s straightforward.

Your personal information isn’t as straightforward. The block doesn’t record your real name. Instead, you use a “digital signature” that is unique to you. The same goes for the seller. That way your transactions are private and no one can identify you as the buyer.

At this point, it’s important to note that blocks can store more than one transaction. When it comes to the Bitcoin blockchain, a single block can store up to 1MB of data. That allows for thousands of transactions per block.

That being said, a block could store only one transaction. How much data goes on a block is dependent on the purpose of the blockchain application.

A block only receives its last bit of information when the nodes verify all its transactions. The final bit of information is the block signature, also known as a “hash.” A blockchain hash is a unique identifying code that helps people distinguish one block from another. Blocks also receive the hash of the block the came before it.

Once a block is created, it joins the chain. To recap, four things must happen for a new block to join the blockchain:

Transactions

Verification

Storage in a Block

Block Assigned a Hash

Once verified, the block joins the blockchain and becomes public. That means that everyone connected to the blockchain gets an updated copy of the database.

How is that possible? How do verifications occur?

Peer-to-peer networks of computers manage the blockchain. These computers are “nodes.” To add blocks, computers run consensus algorithms.

Consensus algorithms do a few things:

Help Establish Trust

Put the Blocks in Order

Validate Blocks

Secure the Blockchain

Think of it like this, instead of a banker confirming a transaction, an algorithm does it.

There are a few types of consensus algorithms:

Proof of Work

Proof of Stake

Proof of Activity

Proof of Burn

Proof of Capacity

Proof of Elapsed Time

Of course, the list is not exhaustive, and we will not go into every example here. But let’s look at the first two – proof of work and proof of stake.

Bitcoin’s blockchain application uses proof of work. The system makes the computers in the blockchain network compete to solve cryptographic math problems to “prove” they’ve done “work.” In simple terms, the computers rush to solve a super hard math problem.

The odds of solving such a problem are astronomical. Plus, the computers must use massive amounts of time and electricity to finish a problem. That’s why completion results in block verification and a payout in Bitcoin.

Proof of stake algorithms awards computers the right to validate blocks. The algorithm could choose you for various reasons depending on how it’s written. One example? It awards the computer with a certain amount of tokens validation rights.

The nice thing about proof of stake is that it doesn’t use as much electricity as the proof of work consensus algorithm. After validation, the system rewards validators with transaction fees.

So, consensus algorithms allow for the verification of transactions. Plus, they help secure the blockchain and provide trust among nodes.

And what about security of blockchain applications?

The nature of blockchain technology makes it one of the most secure ways to transfer money and data online. It also solves the “double spending” problem – spending the same money twice.

A blockchain is a growing list of records that cannot be edited or erased, which is why the technology is trustworthy. Here’s how:

First, the blocks themselves are virtually impossible to alter.

See, blocks join the chain in chronological order. As they join, they get a blockchain hash along with the hash of the preceding block. The first block in the chain is the “Genesis” block. It does not have the hash of a preceding block as it’s the first one.

Now, if a hacker wanted to alter information on a block, the hash would change. That means that the block no longer matches its old hash on the next block. So, the hacker would have to change that hash as well. And so on and so forth until all the following blocks are up to date.

But aren’t computers sophisticated and fast enough to recalculate the invalid hashes?

Well, that’s where proof of work and other consensus algorithms come into play. They work as added safety measures to make sure hackers can’t change blockchain hashes.

See, these algorithms slow down the creation of new blocks. With Bitcoin, it takes about 10 minutes for a computer to create a proof of work and add a new block to the chain. A hacker would need to recalculate the proof of work for all the following blocks. And that takes time.

Finally, the fact that a blockchain is distributed also puts a stop to hacking. When a new block is created, every node in the network gets a copy for verification. At this point, everything needs to be correct for the network to accept it.

To get the network to accept an altered block, a hacker would have to take control of more than 50% of the network. Good luck with that.

That means a hacker needs to change the hashes, solve the proofs, and take over more than 50% of the computer network to alter a transaction. And that’s virtually impossible.

Different Uses for Blockchain Applications and Why They Work

So, how do you know if your use case is going to work? A good way to answer that question is to look at existing blockchain applications.

For starters, here is a shortlist of blockchain applications now in use:

Of course, the list is not exhaustive. UPS, FedEx, and DHL are all using blockchain applications. Plus, there are many startups investing in blockchain and big players are teaming up to shape solutions together. For example, Walmart uses IBM’s blockchain and Augur uses Ethereum.

Meanwhile, Microsoft has teamed up with Ernst & Young to create a blockchain for the gaming industry that will make it easier to manage royalties and rights.

Pro Tip: Joint efforts are a great approach to blockchain applications. When companies like Maersk and IBM or Microsoft and Ernst & Young team up, each brings unique resources to the table. In the end, they can do more together than they could alone.

A Closer Look at Smart Contract Blockchain Applications

It’s worth it to take a closer look at smart contract blockchain applications.

What are smart contracts?

Smart contracts are self-executing contracts that run on blockchain technology. One current example of the technology in action is the Ethereum blockchain.

Ethereum allows users to:

Create and Use Smart Contracts

Create Cryptocurrencies or Tokens

Store and Manage Cryptoassests

Smart contracts are much the same as regular contracts. The main difference? Smart contracts make automatic payouts when relevant parties meet the conditions of the contract.

When combined, smart contracts and cryptocurrencies have the power to create new markets. And smart contracts alone could disrupt every industry. Supply chain management, prediction markets, and royalty management are only a few examples.

But let’s take a look at the financial industry. You may find it surprising, but banks and financial institutions are blockchain leaders. Despite being the middlemen, financial institutions have many use cases for blockchain applications. And according to McKinsey, 90% of Australian, European, and North American banks are already experimenting or investing in blockchain technology.

When it comes to smart contracts, banks can use them to grant personal loans and mortgages with massive cost savings. A study by Capgemini shows that smart contracts can save consumers between $480 and $960 on average.

Banks will also have the potential to cut processing costs between $3 and $11 billion annually in the US and the EU. Everyone saves money on mortgages and loans with the implementation of smart contracts.

Beyond personal loans and mortgages, banks can use blockchain and smart contracts to:

Process Insurance Claims

Speed up Investment Banking Procedures

Handle Payment Processing

For banks, smart contracts speed up all core business process while cutting costs.

A Closer Look at Supply Chain Management Blockchain Applications

Remember how more than 50% of the participants in Deloitte’s survey were using blockchain applications to handle aspects of their supply chain?

That’s because it’s one of the most likely use cases for blockchain technology. Why? Supply chains rely on transparency, trust, and efficiency to work. And blockchain applications enhance all three of these qualities.

Here’s how it might work:

Let’s say that you sell salads. Do keep in mind that Walmart uses a blockchain to track mangos. But let’s stick to salads.

To make your salads, you need lettuce. You get that lettuce from various suppliers. You send the lettuce to various vendors. So, using a blockchain application would allow you to track your salad and the lettuce in it from farm to table.

You could know where your lettuce is at any point, who handles it, and if it’s are in good condition. You can keep track of payments and deliveries. And in the case of contaminated lettuce, you can trace the product back to the source without accruing significant losses.

Future Applications of Blockchain: Disrupting Every Industry

But what about the future of blockchain? Here is a list of blockchain applications that could change the way we make digital transactions online:

Asset Management

Charity

Cybersecurity

Data Backup

Data Sharing

File Storage/ Cloud Storage

Food Tracking

Gift Cards and Loyalty Programs

Governance (Voting, Polling, Passports)

Identity Management

Internet of Things (IoT)

Land Title Registration/ Property Records

Medical Records

Neighborhood Microgrids (Energy)

Notary

Personal Data Management

Prescription Drug Tracking

Sharing Economy

Smart Property/ Ownership

Stock Trading

Tax Regulations and Compliance

Weapons Tracking

Wills and Inheritance

Worker Rights

Of course, the list is not exhaustive. Blockchain applications have the ability to disrupt business processes across every industry.

It’s only a waiting game to see where and how blockchain applications will manifest.

Two interesting applications include identity management and smart property.

A Closer Look at Identity Management as a Blockchain Application

Almost all the applications listed above would benefit from blockchain ID management. That’s because online transactions can’t happen without identity verification.

Blockchains could control personal data for passports, medical records, tax compliance, and voting. To be more precise, you would control digital identification documents.

You could grant access to personal details while controlling who sees what. Only need to prove your age? Blockchain technology could make that happen. Want to change your doctor? Give them access to your medical records via a secure blockchain.

Plus, your data would be safer – in theory. Ideally, blockchain technology would make it harder for hackers to steal your identity or personal data.

Many institutions are already researching how to use blockchain applications for identification.

According to McKinsey, more than 25 governments are already running blockchain pilots. Once implemented, blockchain applications would create a wider economy and make things simpler and more secure for citizens.

A Closer Look at Smart Property as a Blockchain Application

Smart property is a blockchain application that could allow you to prove you own something and to manage and trade your assets easier.

Let’s take a closer look at proof of ownership. Right now, you can’t prove that you own things like your TV or your dishwasher. But with blockchain, you could prove that. You could also prove that you own digital property. A good example is Cryptokitties – digital kittens that you can buy with Ether tokens.

These kittens are stored in the blockchain as your property. That means you can trade and sell your kitties, and you have complete proof of ownership. Here’s a video:

You could also use blockchain applications to register land and create property titles. Legal procedures for documents is often lengthy, expensive, and susceptible to human error. The idea is that blockchain would make paperwork unnecessary.

Imagine having a digital wallet that contains the proof of ownership for all your stuff. From your house and car to your TV and Cryptokitties. To create proof you would only need to make one transaction on a blockchain. No more paperwork. No more middlemen. No receipts. Nothing.

That’s more or less what the future that blockchain-based smart property promises.

Conclusion

When it comes to blockchain applications, it’s no longer a matter of “if” but “when,” and when is coming soon. More companies are investing in and finding use cases for blockchain technology.

Why?

They understand that blockchain applications have the ability to disrupt outdated business processes. They also understand the benefits – greater transparency, trust, and efficiency. Not to mention a cut in costs and an increase in consumer privacy.

All signs point to blockchain applications as the future. Are you ready to jump on board?

]]>https://www.iteratorshq.com/blog/5-steps-to-unlocking-value-of-blockchain-applications/feed/1Tagless with Discipline — Testing Scala Code The Right Wayhttps://www.iteratorshq.com/blog/tagless-with-discipline-testing-scala-code-the-right-way/
https://www.iteratorshq.com/blog/tagless-with-discipline-testing-scala-code-the-right-way/#commentsTue, 18 Dec 2018 15:15:15 +0000https://www.iteratorshq.com/?p=1653With the recent boom in the adoption of so-called final tagless encoding in Scala land, which in turn seems to be addressing the shortcomings of the Free monad approach, the testability of programs is better than ever. The general consensus is that one of the main benefits of the Free / tagless style is that it allows for easy unit testing programs without the tedious process of setting up dependencies etc…

You simply bind to the ID monad (or swap interpreters if you’re a Free fan) and you’re good to test all the pure logic. Obviously, these techniques also help with integration testing by virtue of being able to easily transform components between various monadic contexts.

For example, you can produce a DBIO instance out of your computation and interpret it as a Future in an automatically rolled back transaction. No more setting up fixtures and maintaining the ever-elusive db state in tests. Integration tests can be fully parallel and much less flaky. That being said, it is still a bit of a burden to actually write these tests, mainly because of the informality of a component’s specification that results in a lot of repetitive case-by-case testing.

What do I mean?

Let’s see what the typical approach might look like.

Problem Domain

Let’s say that we have a system storing users along with their preferences and e-mails. GDPR aside, we want to identify these users by their e-mail, set some properties, etc… So, in the tagless manner, we have created two repositories with abstracted monadic context:

In short – you have a bunch of emails and you can attach them to the user. Nothing extraordinary here. Additionally, you can do some lookups and updates. Supposedly, you’d like these structures to be kept in a relational database, so you implement these repositories for DBIO (if you use slick) or something similar (if you don’t).

The details of the implementation do not really matter. It’s sufficient to say that it maps abstract operations to the “real ones.” What matters is that it needs to be tested at some point to ensure correctness of mapping, constraint violation handling, etc…

Typical Integration Testing

Normally, you would write a test case for each expected behavior. First, you would need some fixtures and transactions to prepare the db for a test scenario:

Unfortunately, writing these tests is very tedious and it’s tempting to cut some corners by skipping some important cases. For instance, would you test what happened when you saved a string with all the whitespace? Or various combinations when parts of a UserProfile are missing? Or all the VARCHAR constraints?

Moreover, you need to test a lot of implicit interactions between various methods. What should double save do? What should find do after a successful create ? If find returns something, then what is its relation to getEmails ? All these facts are tested by checking the behavior of the method with respect to some implicit database state. This is achieved by preparing a vast array of fixtures meticulously recreating the desired state before the test.

All this has one detrimental effect when it comes to generality – people work with tagless to abstract away effects. Yet, if you were to exercise this benefit, you’d have to rewrite all the tests with all the specifics of the new effect, making it prohibitively expensive.

Bring on Some Discipline

So, we seek to obtain the following:

Write Less

Test More

Be Explicit about How Methods of Algebra Should Behave

Be Generic

These things seem a bit contradictory. With the typical approach, the only way to test more is to write more tests and fixtures! And how can you be more explicit while being more generic if being explicit means writing fixtures that are everything but generic? It turns out that these properties cannot be obtained through a typical approach, so the only way to proceed is to change the approach.

Instead of writing tests, let’s formulate laws that the implementation of algebra should respect. Then, let’s use the automated law-checking library, Discipline, which will generate a large number of random test cases with ScalaCheck. This will allow us to test with sufficient confidence that any implementation is following our laws.

Thus, we get the following benefits:

We do not have to write tests – only laws and some infrastructure code (data generators, equality definitions).

Tests can exercise cases that are hard to come by when writing them manually (e.g., very large strings, empty values).

Tests work regardless of implementation.

Laws serve as an explicit documentation of behavior.

Let’s see the details!

Writing Laws

When you take a look at the Emails algebra, the following laws come to mind:

For every saved email e, find(e) returns e.

For every saved email e, known(e) returns true.

find is consistent with known i.e., find(e) is defined IFF known(e) is true.

Saving the same email twice always returns EmailAlreadyExists error.

By translating these laws into operations using this algebra, you get (in pseudo-code):

save(e) >> findEmail(e) <-> pure(Some(e))

save(e) >> known(e) <-> pure(true)

findEmail(e).fmap(_.isDefined) <-> known(e)

save(e) *> save(e) <-> pure(Left(EmailAlreadyExists))

In the example above, we use the standard cats syntax where: a >> b means a flatMap (_ => b); a *> b means product(a, b).map(_._2). We also use the <-> symbol to express the equivalent to relation.

You need to be careful to also capture the effects in laws, not just the result. A good litmus test is to see if the law specifies a possible refactoring that doesn’t break anything.

For example, I would not be able to blindly substitute by this law:

save(e) >> known(e) pure(true)

because it completely removes the effect of saving stuff. The correct law would be

save(e) >> known(e) save(e) >> pure(true)

I agree with his insight. For one thing, it makes reasoning about longer expressions correct. Thus save(e) *> save(e) <-> pure(Left(EmailAlreadyExists)) should similarly be rewritten as: save(e) *> save(e) <-> save(e) >> pure(Left(EmailAlreadyExists))
You should keep this advice in mind when working on your laws.
Thanks, Oleg!

Similarly, you can devise a set of laws for Users algebra:

For every created user u, identifyUser(primaryEmail(u)) returns u.

For every created user u, identifyUser(e) returns u IFF e has been attached to the user u.

For every user u with profile p, creating the user and then updating their profile is equivalent to creating the user with the profile already updated i.e., createUser(e, p) >>= (u => updateUserProfile(uid(u), f(p))) <-> createUser(e, f(p)).

Attaching n emails via n calls to attachEmail is equivalent to calling attachEmails once with collection of all n-emails.

To be complete, we should have written laws governing the behavior of the remaining methods – find, getEmails, etc… I took the liberty of skipping it to be concise.

Let’s see how we implement these laws in Discipline.

Implementing Laws

The implementation of law checking needs to be tailored to ScalaCheck to achieve automated testing. That is, a law must be a valid ScalaCheck property. We’ll be using cats-kernel-laws provided IsEq type for that. The purpose of this type is twofold. First, IsEq(lhs, rhs) states that the left-hand side of the IsEq expression is equivalent to its right-hand side. Second, it is convertible to ScalaCheck Prop by Discipline. We form IsEq instances by using a handy <-> operator.

So, the implementation of the first law for the Emails algebra might look like this:

For any email: Email, the expression algebra.save(email) >> algebra.findEmail(email) must be equivalent to M.pure(Some(email)). And that’s what we want every implementation of the Email algebra to respect.

Now, we have to tell ScalaCheck how to generate emails. I recommend reading a ScalaCheck tutorial first, but it’s very simple in essence. There has to be an Arbitrary[Email] instance in the implicit scope of tests:

Finally, we’ll have to specify how to check the equivalence for a given monad M. For the DBIO context, we’ll have to run both sides of the action that we check against the test db in a rolled-back transaction (to give each test a clean db state) and then compare the outputs. (This is an adaptation of a snippet found in slick-cats.)

We now have a test suite that checks if a bunch of random emails can be saved to db and, subsequently, looked-up. Not only is the logic of these operations tested, but you can also catch errors stemming from incorrect mapping of the db schema. Let’s see how the whole suite looks:

You can see that you usually need to write additional generators as you add tests. But since they are composable, writing them is quite easy and bound by the size of your domain. (You only have to write new generators for domain-specific things that ScalaCheck does not know how to mock.) Another idea that may be worth exploring is the automatic derivation of Arbitrary instances for regular product/sum types by Magnolia.

You might be wondering why we are interested in generating a function? Sometimes you can form compact laws by stating that: the law holds under any transformation f. Just as we did in the findKnownConsistency test, def findKnownConsistency(email: Email, f: Email => Email).

It can be read as: given any saved email e and arbitrary transformation f, the result of find is defined for f(e) if the known(f(e)) is true. This is a stronger statement than saying that the law holds for any saved email e, letting us skip separately testing the case when find returns None.

To see why, let’s consider that f is a function that appends xyz to the mailbox part of the email address. The equivalence should hold for the choice of f, and, indeed, save(email) >> find(Email(s"xyz$email")).map(_.isDefined) is None, and save(email) >> known(Email(s"xyz$email")) is false. Alternatively, when f is the identity function, we expect find to return Some and known to return true. Thus, we have conflated two cases into one law.

Conclusion

I hope that my article has convinced you that testing in tagless should be based on abstract law rather than ad-hoc test cases. Can you think of any cases where this approach would be inferior to writing tests manually? Certainly, writing laws is harder than coming up with a bunch of test cases, and it requires some getting used to.

Conceptually, all type classes come with laws. These laws constrain implementations for a given type and can be exploited and used to reason about generic code.typelevel.org

Gone are the days when anonymous shoppers browsed generic stock for an elusive item. Today, anyone can serve customers unlimited, personalized offers tailored to their interests.

All you need is data, right?

Not quite.

With all the options online, you need a system that narrows down the possibilities. Something that learns what people like.

That’s where recommender systems come into play.

Implementing a recommender system allows you to turn raw data into personalized offers. And personalized offers result in higher customer satisfaction, engagement, and sales.

Sounds good, right?

That’s why the following article will tell you:

What recommender systems are and how they work.

Different strategies for implementing recommender systems.

How to check if a recommender system is effective.

Already know that you need a recommender system for your project? We can help! At Iterators, we design, build, and maintain custom software for startups and enterprises businesses.

Schedule a free consultation with Iterators. We’re happy to help you find the right solution.

What is a Recommender System?

So, let’s start with the basics:

What is a recommender system?

A recommender system is a type of information filtering system. By drawing from huge data sets, the system’s algorithm can pinpoint accurate user preferences. Once you know what your users like, you can recommend them new, relevant content. And that’s true for everything from movies and music, to romantic partners.

Netflix, YouTube, Tinder, and Amazon are all examples of recommender systems in use. The systems entice users with relevant suggestions based on the choices they make.

Recommender systems can also enhance experiences for:

News Websites

Computer Games

Knowledge Bases

Social Media Platforms

Stock Trading Support Systems

And the list is not exhaustive. Bottom line? If you want to provide user with targeted choices, recommender systems are the answer.

Here’s an example of a recommender system in e-commerce. H&M served the following recommendations to users who clicked on “pleated skirt” as a potential buy:

Why Adding a Recommender System to Your Website is Beneficial

So, what are the advantages of adding a recommender system to your website or software?

Here’s a list of just a few:

Increase in sales thanks to personalized offers.

Enhanced customer experience.

More time spent on the platform.

Customer retention thanks to users feeling understood.

A recent study by Epsilon found that 90% of consumers find personalization appealing. Plus, a further 80% claim they are more likely to do business with a company when offered personalized experiences.

The study also found that these consumers are 10x more likely to become VIP customers, who make more than 15 purchases per year.

The moral of the story? If you’re interested in cross selling or serving personalized offers, a recommendation system is right for you.

How to Solve the Long Tail Problem with Recommender Systems

Another positive benefit of using a recommender system is that you solve the long tail problem of online shopping.

When you go into a brick-and-mortar store, you only see a limited number of items to buy. There’s only so much space, right?

So, recommending customers things to buy is easy. At the front of the store you put your newest, most popular goods on display. Walk into a bookstore and there are three tables:

New York Times Best Sellers

Hot New Vampire Books

Popular Autobiographies

Take your pick.

Now, go online. Boom! you’re presented with millions of options. And if you don’t know what you’re looking for, you might find the sheer number of options overwhelming.

The problem is known as the long tail problem.

So, how does an online retailer help customers suffering from information overload?

Recommender systems.

So, let’s say you want to buy a book. You go online to Amazon and the first thing you see:

It’s the same as the display tables in the brick-and-mortar stores. But once you start making choices on the platform, Amazon’s recommender system takes over. Let’s say you search for The Great Gatsby. Amazon recommends:

Here the system served you Fahrenheit 451. That’s because past Fitzgerald customers must have also bought Bradbury. As an alternative, your recommender system could offer other Fitzgerald books.

How Recommender Systems Provide Users with Suggestions

Using machine learning, recommender systems provide you with suggestions in a few ways:

Collaborative Filtering

Content-based Filtering

Hybrid (Combination of Both)

Collaborative Filtering Recommender Systems

For starters, popular examples of collaborative filtering systems include Spotify, Netflix, and YouTube. But what does a collaborative filtering recommender system do?

A collaborative filtering recommender system analyzes similarities between users and/or item interactions. Once the system identifies similarities, it serves users recommendations. In general, users see items that similar users liked.

There are different types of collaborative filtering systems including:

Item-item Collaborative Filtering

User-user Collaborative Filtering

Item-item Collaborative Filtering

An item-item filtering algorithm analyzes product associations taken from user ratings. Users then see recommendations based on how they rate individual products.

For example, you rate a book or movie as a 10/10. Now, you will see the top rated books or movies with similar attributes. Below is an example from Goodreads.

I created a special list for books that I gave five-star ratings. Goodreads then recommends me the highest ranked books from similar readers’ lists.

It’s not always easy to get users to give items ratings. That’s why item-item filtering can be as simple as clicking on a dress and seeing more dresses.

Ever come across the “people who viewed this item also bought” copy under a product?

That’s right. That’s also an item-item filtering system.

Amazon invented item-item filtering for their recommender system. Item filtering works best when you have more users on your platform than items.

User-user Collaborative Filtering

The other kind of collaborative filtering takes the similarity of user tastes into consideration.

So, user-user collaborative filtering doesn’t serve you items with the best ratings. Instead, you join a cluster of other people with similar tastes and you see content based on historic choices.

Let’s say you use YouTube for the first time. You play a Beyonce song. The system clusters you with other users who also like Beyonce. Then the YouTube recommendation system shows you other videos chosen by users in your cluster. The more choices you make, the more relevant the results.

Ever wonder why your YouTube playlist gets messed up after a party?

It’s because Brenda decided to karaoke Disney songs for two hours. Brenda chose Disney songs, now you’re clustered with the Disney kids.

Thanks, Brenda.

Content Based Recommender Systems

Content based filtering uses characteristics or properties of an item to serve recommendations. Characteristic information includes:

Characteristics of Items (Keywords and Attributes)

Characteristics of Users (Profile Information)

Let’s use a movie recommendation system as an example. Characteristics for the item Harry Potter and the Sorcerer’s Stone might include:

Director Name – Chris Columbus

Genres – Adventure, Fantasy, Family (IMDB)

Stars – Daniel Radcliffe, Rupert Grint, Emma Watson

A content based recommender system can now serve the user:

More Harry Potter Movies

More Adventure, Family, or Fantasy Movies

More Chris Columbus Movies

More Daniel Radcliffe Movies

Of course, the list is not exhaustive. Once the user makes choices, the recommender system can serve more targeted results.

The system may also show the user more Harry Potter movies. The hypothesis is that if a user liked an item in the past, they might like similar items in the future.

Content based filtering systems can also serve users items based on users’ profiles. You can create user profiles based on historical actions. You can also ask users upfront about their interests and preferences.

Pro Tip: Using a hybrid recommender system allows you to combine elements of both systems. In general, that means elements of one system can remedy the pitfalls of the other.

Pitfalls of Different Types of Recommender Systems

And now for the bad news. Each type of recommender system has its own set of problems. Let’s take a look.

COLLABORATIVE FILTERING RECOMMENDER SYSTEM PROBLEM

Collaborative filtering needs a lot of data to create relevant suggestions. So, when you start using a platform with a collaborative filtering system, you start cold.

The cold start problem in recommender systems is common for collaborative filtering systems.

For example, when John visits YouTube for the first time, the system has to wait on him to watch several videos. Only then can it serve him relevant recommendations for other videos.

COLLABORATIVE FILTERING RECOMMENDER SYSTEM SOLUTION

A solution to the cold start problem in recommender systems is clustering data with attribute similarities. Let’s go back to our YouTube example.

John visits YouTube for the first time. The first video he selects is a Beyonce video. As mentioned before, the platform will cluster John with other users who watched the same video.

It could also add him to other clusters. Let’s say the video belongs to the “pop song” cluster. Needless to say, the pop song cluster is populated with pop songs, such as “Hit Me Baby One More Time” by Britney Spears.

Now, the system can recommend other songs based on the following criteria:

Other Beyonce Listeners’ Choices

Other Songs in the Pop Cluster

Another solution is to start by recommending users popular items.

YouTube and Amazon’s homepages both show users trending or popular items. IMDb has a similar strategy, showing new visitors the top rated 250 film titles.

A final strategy is to request new users to provide the platform with information.

CONTENT BASED FILTERING RECOMMENDER SYSTEM PROBLEM

The problem with content based recommender systems is that they are restrictive. You click on a dress and you see more dresses. The system is incapable of knowing that your interests go beyond liking dresses.

CONTENT BASED RECOMMENDER SYSTEM SOLUTION

Again, a common solution is to ask users up front what kind of things they like. And as users interact with your site, you can use historical data to recommend them more tailored choices.

The customer buys a dress and some shoes. Now, you know that she likes both.

How to Implement a Recommender System

There are a couple of ways to go about adding a recommender system to your software or website. But first, you need to find the right people.

Usually, implementing a recommender system comprises three activities done by different people. These activities include:

Designing and evaluating a recommendation model.

Scheduling system updates and piping data into the model and out to the user.

Integrating the recommender system with the company’s “business system.”

Here’s a basic flowchart of a recommender system:

To oversee these activities, you would need to hire (respectively):

Data Scientists (Design and Evaluation)

Data Engineers (Updates and Data Pipeline)

Web and Front-end Developers (Integration with Website)

Let’s say you want to hire in-house people who will implement and manage your recommender system. Expensive? Yes. Here are the average salaries for each according to Glassdoor:

Not to worry. You don’t have to create a recommender system from scratch. There are ready made SaaS solutions for recommender systems in e-commerce.

And for movie or music recommendation systems, there are off-the-shelf solutions. For example, it is possible to get an algorithm similar to the one that runs Netflix’s recommendation system.

Depending on your needs, you could also consider outsourcing. Outsourcing is beneficial because it enables flexibility and can be cheaper. Plus, you’re sure that you’ve hired people who know what they’re doing, and you don’t have to worry about employee turnover or recruitment.

Pro Tip: You don’t want to hire a data scientist before you have data. They won’t have any work and they will leave or get stuck doing the work of a data engineer. Hire a data engineer first.

Getting Started – Gathering Data for Your Recommender System

As you can see in the chart, there are different datasets you might want to consider like:

User Behavior or User Ratings

Item Attributes

User data is necessary for serving user-item recommendations via a collaborative filtering recommender system. It does require you to have access to a large number of user interactions.

To get user data, you can either ask for ratings or draw conclusions from user behavior. Asking for ratings is problematic because most users don’t bother to give items ratings. Drawing conclusions from user behavior is problematic because of the cold start problem.

As mentioned before, some solutions include:

Recommending Popular Items from the Start

Onboarding Users through Profile Creation

Clustering Users with Similar Users

Second, you can design an attribute ontology to gather item attributes for item-item recommendations. If you have limited metadata from users, you can use item attributes to create a content-based recommender system.

Either way you’ll need your web developer to integrate both front and back-end features into a supported system. Once that’s in place, the process runs continuously.

Remember, the more data you have and the fresher it is, the better your recommender engine.

Designing and Building a Recommender System for Your Business

Once you have data, your data scientists design a way to use it to build recommendations.

Building a utility matrix using matrix factorization techniques for recommender systems is a popular way to get started.

Do keep in mind that there are many ways to build recommendation systems. The following example is a highly simplified version of what a system looks like in reality.

The technique allows you to complete unknowns on a matrix based on user-item interactions.

Here’s a very basic example of a utility matrix for recommender systems:

Users

Monty Python and the Holy Grail

Monty Python’s Life of Brian

Monty Python’s The Meaning of Life

John

(5)

(2)

Jane

(3)

Utility matrices organize rankings users assign to different items. In the example above, John gave Monty Python and the Holy Grail the highest score, while assigning The Meaning of Life a two. He didn’t rank Life of Brian.

When using a utility matrix, you should assume that the data will be sparse with more blank spaces than rankings. The goal of the matrix is to predict how John and Jane will rank the remaining films.

The example above doesn’t give much insight into whether Jane will like either of the two films she did not rank. But in a real life scenario, you would have much more data.

A real recommender system would compare her score with hundreds of similar scores. Patterns begin to emerge. For example, let’s say people who gave Life of Brian a 3 gave Holy Grail a 5 while giving The Meaning of Life a lower score.

The logical conclusion? Jane will like Holy Grail more than The Meaning of Life as the others did. So, let’s recommend her Monty Python and the Holy Grail.

Regardless, building a utility matrix requires large amounts of data. And that data is always going to be sparse, so your recommendation system algorithms will need to account for that.

Once the system is in place, data engineers flood the system with vast amounts of data. The model then runs in real time, preparing batches of recommendations for users.

Pro Tip: To populate a utility matrix with behavioral data, you would use “1” for “like” and “0” for no action taken. Using 0 doesn’t mean the user dislikes that thing, only that she took no action.

You’ll need to test for effectiveness so you can continue to improve recommendations.

How to Evaluate if a Recommender System is Effective

How do you know if the your new recommender system is up to snuff? Well, there are a few ways to measure the effectiveness of your recommender engine:

User Studies / Personas

Offline Recommender System Survey

Online Recommender System Survey (A/B Testing)

But first, you might want to take a look at the different features of recommender systems. You can use features as metrics for benchmarking.

For example, coverage is the degree to which you cover all available items and actions with your system. It’s a useful metric for making sure your system is thorough. Keep in mind that not every feature is suitable for every type of recommender system.

User Preference: When systems make recommendations based on user interests, habits, and goals.

Prediction Accuracy: When systems make accurate predictions about the results of serving a recommendation to a user.

Coverage: The degree to which recommendations cover all available items and actions.

Confidence: The system can tell the user how confident it is about its recommendations.

Trust: How much users trust the system. Often, if a system explains it’s recommendations and they are reasonable, the user trusts the system.

Novelty: When the system recommends items that users did not know about prior.

Serendipity: How surprising the user finds the recommendations.

Diversity: The more diverse the recommendations, the broader the offer for users.

Utility: Measure of how useful the recommendation is for the user.

Risk: When the recommender system can tell the user the risk of following a recommendation.

Robustness: The stability of the system in the face of abuse or fake information.

Privacy: Despite users willingly giving information to recommender systems, they still want that information to be private.

Adaptivity: When the recommender system can adapt and serve users relevant recommendations even when the content and environment is dynamic.

Scalability: The system can handle a growing amount of data.

Once you’ve decided which features you need, you’ll want to make sure they’re working. You can do that by testing your system.

User Studies as a Way to Evaluate Recommender Systems

In the beginning, you will have no real knowledge about your users or their interests. So, how do you create personalized offers for people you don’t know?

The easiest way to test what will be effective is to create user personas. There are various assumptions you can make about your future users before they start using your platform.

Let’s say you sell books.

You can create personas based on choice prediction. Michael selects a book. With some statistical accuracy, your recommender system can predict whether Michael will like another book. You could base it on genre, author, and other content attributes.

You can create personas along these interests paths to serve users initial recommendations.

A good way to assign user personas is to have new users answer a few brief onboarding questions. And that’s especially true for mobile applications.

Here’s an example of Foursquare’s onboarding process:

Good onboarding questions might also include requests for demographic data about your users. Knowing simple information like age and location, targets your recommendations. If your reader is 15, she is going to be much more tempted by a Twilight recommendation than if she’s 80.

One free and easy way to get some basic insight into user demographics is to use Google Analytics. Most of you already have Google Analytics hooked up to your website. So, why not use it to give you insight into user demographics?

Google Analytics shows you the age, gender, and interests of the people visiting your website. You could also use social media analytics to get ideas for user personas based.

To use Google, you click on “Audience” and then “Demographics” to view the age, gender, and interests tabs. Enabling the advertising features can give you a deeper look into user demographics.

Offline Recommender System Survey

It is important to note that offline recommender system evaluations are often treated with skepticism. That’s because you use offline methods before the launch of a recommender engine. So, instead of involving users, you run simulations.

And the results of these simulations may not correlate with real user satisfaction. The responsibility of running a simulation will fall to those who created your recommender system.

Online Recommender System Survey (A/B Testing)

Once you launch your recommencer system, you run online evaluations. These evaluations comprise various A/B tests that you serve to users live.

If you’re unfamiliar with A/B testing, it involves serving different options to users who arrive at the same point. One user sees option “A” while the other sees option “B.” Whichever option does better, that’s the one you continue to serve all users.

By running A/B tests, you can see which recommendation inspires more clicks and conversions. While ideal in the long run, this form of testing can have a negative impact on revenue and user experience if users don’t like either option.

Conclusion

Internet users and online shoppers have come to expect personalized experiences. At the very least, they want websites to make recommendations so they don’t waste time sifting through things they don’t like.

Recommender systems are a great way for any business to personalize their offers. While implementing a system may be costly, you’re sure to benefit from targeting your content.

Whether your goal is to keep people on your platform (YouTube) or show users exciting, new offers (Amazon) – a recommender system is the answer.

I’m the type of person who thinks about the darkest questions humanity has ever faced, such as Are humans born good or bad? or What will I have for my lunch? But there was one day when I thought: “What does scalac (Scala compiler) do during compiling time?” So, I asked one of my friends, Google, and I found out that my code goes through more than 25 phases during that time! I gotta say, I was intrigued.

Then, there was a strange bug with slickless. I hope you’re familiar with Slick — it’s one of the most popular libraries for databases in Scala, but it has one constraint. A case class that represents a table in the database cannot have more than 22 fields. Slickless provides the necessary implicits that enable us to use HList instead of case class representation. However, if a list of fields in HList is in the wrong order, or is incomplete, compiling time rapidly increases (more than 15 minutes). That was when I knew I had to know how it all works.

Scalac is difficult. And that’s especially true for me, as I’ve just finished physics, and started learning compiler techniques. Just imagine — you write some fancy functional Scala code that gets translated into JVM-not-so-cute bytecode (as shown below) and then it just works!

First, I want to explain some of the abstractions used in the process.

val a = 5
val b = 6
println("Result: " + (a + b))

An abstract syntax tree — also known as AST — is a representation of how your code is interpreted by the compiler. It is created by parser and lexer — some common compiler components. AST preserves all operations and values in your project. The easiest way to explain it is to show you examples:

A symbols table is a data structure that has all values, classes, objects, and traits that you used in your project with additional info about their kind and owners. It enables the compiler to reach for them any time. In Scala, it can be seen by adding -Yshow-syms flag with the compiler. I will show you an example later.

JVM is a magic machine that translates .class files into a working program. Many convenient features are a part of JVM — for example JIT (special compiler used for optimizations) or garbage collector (memory manager).

AST Created from Code with Existential Types

How it actually works

Usually, compiler work is sectioned into three parts — front end, middle end, and back end. The front end creates an AST tree and modifies it on some basic level. The middle end does some platform-independent optimizations (tail calls), and the back end does optimizations for certain CPUs and generates assembly files. Scala compiler is organized somewhat differently, for instance CPU specific optimizations are delegated to JVM.

All 25 phases are connected in a pipeline. Each phase performs transforms your code. Linguistic complexity can be described by this graph:

In the beginning, compilation intermediate results become more complex as it is created and typed. After, it simplifies as the compiler removes advanced language features.

In the next part, I want to present some of the compiler phases in Scala — I will be using a prettier representation of AST trees provided by Scala to show you differences between each part.

Phases

Parser

The first phase creates non-typed AST using parser and scanner. This is the moment when syntax errors are thrown. Moreover, XMLs are translated and our precious functional code is desugared into simpler structures. For example:

Namer, Typer, Package Objects

Symbols Table

The three phases form one object in the compiler code as they have many mutual dependencies. At first, the namer creates a symbols table. In the next examples, there are two symbols tables: one from namer and the other from typer. The latter phase adds more information to the table about values inside objects.

Tail calls

The tail calls phase optimizes tail recursion: in bytecode it is replaced with jump calls, so in bytecode it looks like a normal for loop in action.

Tail Calls in Action

Specialize

JVM has a very special way of dealing with generics. When you use List[A], in the end, you always get a List[Object]. In Java, it is forbidden to put a primitive type into Array — it has to be a class type. In Scala, it is very similar, even if an Int in Scala is an object. The main disadvantage of this solution is that objects take more memory than primitive types, so operations on lists are slower. To make them faster, it is possible to use specialize annotation — it forces compiler to use primitives and avoid type erasure. This phase deals with this case — for a more detailed description please visit this site.

Erasure and posterasure

Type Erasure

When you use generics or value classes in your code, they are erased during this phase (type erasure is a JVM feature that was created with generics with Java 5 to enable backwards compatibility). The posterasure phase cleans up unnecessary code, does some optimizations, and unboxes value classes:

The Constructors phase is responsible for translating Scala constructors into JVM-friendly ones. It creates field definitions in a constructor. Later, flatten lifts inner classes to the top level.

JVM

In the Scala 2.12 compiler code, there is only one phase that performs back end actions, surprisingly, called JVM. In the beginning, it creates an array of primitive types that is demultiplexed by providing a mapping from their symbols to integers. Later on, ClassBTypes are created from symbols and a few steps later the bytecode is generated. At the end, there is some post-processing, like closure optimizations or eliminating ureachable code. To perform a transformation into the bytecode, Scala uses ASM library.

From one file containing the following:

object Main extends App

Three files with .class extension are created. The first represents the class, the second is a companion object, and the last one — delayed init. The first two files with corresponding names are a representation of the code; the last one enables our program to delay its init and read parameters for the main function.

The created bytecode is quite complicated. To fully see its possibilities, I encourage you to experiment on your own. All you have to do is compile your code and run the javap command with different flags.

Why so long?

Now, you’re probably wondering why it actually takes so long to compile Scala code — well, the longest phase (not surprisingly) is typer. As I mentioned before, Scala has a rich, complicated type system that needs a lot of time to process. Furthermore, such libraries as Shapeless are based on implicits and macros that are also time consuming.

I compiled one of my projects with the –Xprint:all flag to present time differences between each phase:

All Phases

Typer vs Non-typer

Scalac’s bottlenecks seem to be implicit search and macro expansions. There are some things that can be done to minimize the problem. However, now I know that when I’m waiting impatiently for the results of compilation, compiler is probably stuck on the typer phase. Type system, in my opinion, is Scala’s best quality, and I think it is understandable that the phase takes so long.

Yesterday, I found out that the slickless bug was fixed, now at worst compiling time takes 3 minutes. They stopped using existential types.

Thanks for reading! Leave a comment in the section below, it would mean a lot to me and would help other people see the story.