I’ve been learning more about Scala’s type system, and last week, I hit some of its limitations. In this post, I’ll be covering an issue called type erasure. If you’re interested in understanding generic types in Scala better, or about how to use reflection, keep reading!
And if you’re seeing a warning along the lines of:
non-variable type argument Int in type pattern Seq[Int] (the underlying
of Seq[Int]) is unchecked since it is eliminated by erasure
definitely keep reading!
Note that this post assumes some knowledge of Scala.
What is Type Erasure?
Let’s say you have the following code:
case class Thing[T](value: T)
def processThing(thing: Thing[_]) = {
thing match {
case Thing(value: Int) => "Thing of int"
case Thing(value: String) => "Thing of string"
case _ => "Thing of something else"
}
}
println(processThing(Thing(1)))
println(processThing(Thing("hello")))
Here, we have a generic class called Thing
that contains a single value
of some generic type T
. We also have a function called processThing
that takes a Thing
. The _
represents an “existential type”, meaning
that you can pass in a Thing
of any type into the function. processThing
pattern matches on the inner type of Thing
, and returns a String
depending on the type.
This seems reasonable–it’s not crazy that Scala, when executing this
pattern match, could run through each of the cases and for each of the
cases check the type of the instance of value
, which is known at runtime.
And as you’d expect, this code prints out:
$ sbt run
...
Thing of int
Thing of string
Let’s try something else
We’re going to add another case to our pattern match now:
def processThing(thing: Thing[_]) = {
thing match {
case Thing(value: Int) => "Thing of int"
case Thing(value: String) => "Thing of string"
case Thing(value: Seq[Int]) => "Thing of Seq[int]"
case _ => "Thing of something else"
}
}
Again, this seems pretty reasonable. If we can check the type of the
instance to match it against an Int
or a String
, we should be able
to match it against a Seq[Int]
.
When we compile this, however, we see the following warning:
non-variable type argument Int in type pattern Seq[Int] (the underlying
of Seq[Int]) is unchecked since it is eliminated by erasure
We’ve hit an instance of what’s called “type erasure”. What this means is that when Scala is compiled, if there are generic types in the program, information about the specific is checked during compiled time, but not available for the runtime to use.
What does it mean for types to be erased?
To be more specific about what’s happening here–there is a distinction between how generic types are treated at compile-time in a Scala program, and how they are treated at runtime. As an example, say you have the following:
val seq : Seq[String] = Seq(1,2,3)
The Scala compiler will check to make sure that value that you are assigning
to seq
, which we are asserting is a Seq[String]
. And we will get a
type mismatch error in compilation.
After these checks happen, the compiler then removes the specific type information from generics, and this code will become the following, as far as the Java bytecode that is produced is concerned:
val seq : Seq = Seq(1,2,3)
Because of the compile-time checks, we get nice, type-safe code. However, the problem with the “underlying” types getting erased is that you can no longer do runtime checks on those types.
For instance, if you were to try:
val seq : Seq[Int] = Seq(1,2,3)
Then:
seq.isInstanceOf[Seq[Int]]
is obviously true.
seq.isInstanceOf[Int]
is obviously false.
seq.isInstanceOf[Seq[String]]
This strangely also returns true
! Again, this is because this code
compiles to seq.isInstanceOf[Seq], since the “underlying” types are
erased. To elaborate on this a little further, typing
seq.isInstanceOf[Seq[String]]
does not actually violate any type checks,
so it’s valid Scala code. Scala will produce warnings for this, however.
How serious of an issue is this?
Alright, so the next question is how serious an issue it is if you start
seeing warnings like this. The problem now is that even if the code
compiles, your program’s behavior may not be what you expect. Let’s
revisit our processThing
example:
def processThing(thing: Thing[_]) = {
thing match {
case Thing(value: Int) => "Thing of int"
case Thing(value: String) => "Thing of string"
case Thing(value: Seq[Int]) => "Thing of Seq[Int]"
case _ => "Thing of something else"
}
}
If we run this on the following line:
processThing(Thing(Seq(1,2,3)))
This yields the string “Thing of Seq[Int]”. If we run this line:
processThing(Thing(Seq("hello", "yo")))
It also yields the string “Thing of Seq[Int]”.
Annoying, but again, we now know why it happens! The line:
case Thing(value: Seq[Int]) => "Thing of Seq[Int]"
compiles down to:
// not valid code
case Thing(value: Seq) => "Thing of Seq[Int]"
This behavior is surprising for the developer, but things can be even
worse when in the pattern match, you make assumption about what kind
of sequence you have. For instance, if processThing
is implemented as:
def processThing(thing: Thing[_]) = {
thing match {
case Thing(value: Int) => "Thing of int"
case Thing(value: String) => "Thing of string"
case Thing(value: Seq[Int]) => "ints sum to " + value.sum
case _ => "Thing of something else"
}
}
Here, value.sum
is only valid if value
is a Seq
of numeric types,
like Int
. If value
is a Seq[String]
, this will blow up with the
error:
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
Can we solve this with reflection?
Scala provides a reflection API that allows you to inspect the types of your instances at runtime, that can be used to accomplish some of what we want.
TypeTag
The Scala reflection API has a mechanism called TypeTag
that allows
you to inspect the type of instances, including the types of generics,
at runtime. As the docs, state “TypeTags can be thought of as objects which carry along all type information available at compile time, to runtime”.
Let’s see how we can rewrite our processThing
function to use TypeTag
:
import scala.reflect.runtime.universe._
...
def processThing[T: TypeTag](thing: Thing[T]) = {
typeOf[T] match {
case t if t =:= typeOf[Seq[Int]] => "Thing of Seq[Int]"
case t if t =:= typeOf[Seq[String]] => "Thing of Seq[String]"
case t if t =:= typeOf[Int] => "Thing of Int"
case _ => "Thing of other"
}
}
Alright, let’s unpack what’s happening here:
-
We added a generic type to the function that’s now associated with
TypeTag
. This is an indicator to the compiler that the compiler should capture the type information aboutT
whenprocessThing
is invoked. -
We now pattern match on the
typeOf[T]
, instead of onThing
itself
This works correctly, and now we can actually distinguish between
a Thing
containing a Seq[Int]
, and one containing a Seq[String]
.
What if we want to operate on the value in Thing?
Now, remember the example where we actually do something with
the value
inside of Thing
? Can we use reflection to do that too?
Well, now that we have the ability to check the types of things, we can add in a type check to ensure that we’re hitting the cases that we care about.
def processThing[T : TypeTag](thing: Thing[T]) = {
thing match {
case Thing(value: Int) => "Thing of int " + value.toString
case Thing(value: Seq[Int]) if typeOf[T] =:= typeOf[Seq[Int]] => "Thing of seq of int" + value.sum
case _ => "Thing of something else"
}
}
This code, with the additional guard, typeOf[T] =:= typeOf[Seq[Int]]
, can now distinguish between a Seq[Int]
and a Seq[String]
. We finally have achieved the behavior that we want.
However, during compilation, we still see the original warning that we were getting:
non-variable type argument Int in type pattern Seq[Int] (the underlying
of Seq[Int]) is unchecked since it is eliminated by erasure
This makes sense–adding in the if
statement guarantees that in practice
that the value that emerges in that case
statement will in fact be a
Seq[Int]
, but because that guard code is executed at runtime, it is
impossible to at compile-time to assert that value
is in fact a
Seq[Int]
.
Because of the way Scala type erasure works, this is the best we can do
in this case. You probably want to add an @unchecked
annotation here as
well to suppress the warning:
case Thing(x: Seq[Int] @unchecked) => ...
Other options
It’s worth noting that this
is really only problem because the type T
of Thing[T]
in
processThing
is completely unconstrained. If you knew in advance what
possible types you might stick into a Thing
, you could develop a different
type, with subtypes that you could match on instead:
sealed trait ThingValue
case class SeqIntThingValue(value: Seq[Int]) extends ThingValue
case class SeqStringThingValue(value: Seq[String]) extends ThingValue
def processThing[T <: ThingValue](thing: Thing[T]) = {
thing match {
case Thing(SeqIntThingValue(value: Seq[Int])) => "Seq of Int" + value.sum
case _ => "Other thing"
}
}
Why does Scala do this?
This seems like an annoying constraint, and given that in general, in Scala, you can pattern match on arbitrary types of values, it is fairly surprising.
The reason type erasure is required in Scala is because Scala is a JVM-based language and needs to interop with Java. Java in its early iterations did not actually have generics, and when generics were added, they were implemented with type erasure so that the bytecode would be interoperable with bytecode generated from older versions of Java.
In order for that type information to be preserved in the bytecode, changes would have been needed to make to how JVM bytecode actually works. I’ve left some links at the bottom of this article that go into more detail.
Conclusion
Scala has a powerful type system, but because of quirks of the JVM, there are still some limits.
Hope that this post helps you solve your problems with erased type!
Further Reading
More on Type Erasure
- https://medium.com/@sinisalouc/overcoming-type-erasure-in-scala-8f2422070d20
- http://www.angelikalanger.com/GenericsFAQ/FAQSections/TechnicalDetails.html#FAQ001
TypeTag
- https://stackoverflow.com/questions/12218641/scala-what-is-a-typetag-and-how-do-i-use-it
- https://docs.scala-lang.org/overviews/reflection/overview.html
The drawbacks of using runtime information in an unconstrained way