NullPointer #64

iainstarks · 2019-12-07T16:52:39Z

We are using clutered ehcache 3.8.1 and after some time ( about 6 hours and under some load )
we see the exception below.
It appears to then lose cache entries.
The exception then repeats roughly every second.

Prior to that there was a single warning - which is also pasted below some 15 mins before the repeating error.
Earlier still there was a single ERROR, again I've pasted this below.

Any suggestions of what the cause might be greatly appreciated.

Repeating error:

2019-12-06 22:41:55,843 [WorkerThread(request_processor_stage, 2, 2)] ERROR org.ehcache.clustered.server.store.ClusterTierActiveEntity - Unexpected exception raised during operation: APPEND
java.lang.NullPointerException: null
at org.terracotta.offheapstore.paging.OffHeapStorageArea.readInt(OffHeapStorageArea.java:173)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.isHead(OffHeapChainStorageEngine.java:776)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.findHead(OffHeapChainStorageEngine.java:769)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.access$1300(OffHeapChainStorageEngine.java:42)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$StorageOwner.evictAtAddress(OffHeapChainStorageEngine.java:783)
at org.terracotta.offheapstore.paging.OffHeapStorageArea.release(OffHeapStorageArea.java:579)
at org.terracotta.offheapstore.paging.OffHeapStorageArea.shrink(OffHeapStorageArea.java:681)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.shrink(OffHeapChainStorageEngine.java:340)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap.shrink(AbstractLockedOffHeapHashMap.java:500)
at org.ehcache.clustered.server.offheap.OffHeapChainMap.shrink(OffHeapChainMap.java:345)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.tryShrinkOthers(OffHeapServerStore.java:242)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.handleOversizeMappingException(OffHeapServerStore.java:212)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.consumeOversizeMappingException(OffHeapServerStore.java:196)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.append(OffHeapServerStore.java:124)
at org.ehcache.clustered.server.ServerStoreImpl.append(ServerStoreImpl.java:86)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeServerStoreOperation(ClusterTierActiveEntity.java:425)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActiveInternal(ClusterTierActiveEntity.java:329)
at org.terracotta.client.message.tracker.OOOMessageHandlerImpl.invoke(OOOMessageHandlerImpl.java:75)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:317)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:134)
at com.tc.objectserver.entity.ManagedEntityImpl.performAction(ManagedEntityImpl.java:879)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$invoke$12(ManagedEntityImpl.java:591)
at java.util.Optional.ifPresent(Optional.java:159)
at com.tc.objectserver.entity.ManagedEntityImpl.invoke(ManagedEntityImpl.java:591)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$processInvokeRequest$5(ManagedEntityImpl.java:376)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1311)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1247)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.invoke(RequestProcessor.java:182)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.run(RequestProcessor.java:172)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:28)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:24)
at com.tc.async.impl.StageImpl.lambda$null$1(StageImpl.java:115)
at com.tc.async.impl.MonitoringEventCreator.lambda$createEvent$0(MonitoringEventCreator.java:77)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:307)

Warning:

2019-12-06 22:26:29,634 [WorkerThread(request_processor_stage, 3, 3)] WARN org.terracotta.offheapstore.OffHeapHashMap - Expanding reprobe sequence from 512 slots to 1024 slots [load-factor=0.48480225]

Single ERROR:

2019-12-06 04:53:40,671 [WorkerThread(request_processor_stage, 4, 4)] ERROR org.ehcache.clustered.server.store.ClusterTierActiveEntity - Unexpected exception raised during operation: ITERATOR_OPEN
java.lang.NullPointerException: null
at org.terracotta.offheapstore.paging.OffHeapStorageArea.readInt(OffHeapStorageArea.java:173)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.readElementLength(OffHeapChainStorageEngine.java:162)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.access$700(OffHeapChainStorageEngine.java:42)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$AttachedInternalChain.readElementBuffer(OffHeapChainStorageEngine.java:687)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$AttachedInternalChain.detach(OffHeapChainStorageEngine.java:473)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.create(OffHeapChainMap.java:368)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.create(OffHeapChainMap.java:362)
at org.terracotta.offheapstore.OffHeapHashMap$HashIterator.(OffHeapHashMap.java:1390)
at org.terracotta.offheapstore.OffHeapHashMap$EntryIterator.(OffHeapHashMap.java:1525)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap$LockedEntryIterator.(AbstractLockedOffHeapHashMap.java:385)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.(OffHeapChainMap.java:362)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap.detachedEntryIterator(OffHeapChainMap.java:362)
at org.ehcache.clustered.server.offheap.OffHeapChainMap.iterator(OffHeapChainMap.java:243)
at org.ehcache.clustered.server.offheap.OffHeapServerStore$1.getNextIterator(OffHeapServerStore.java:374)
at org.ehcache.clustered.server.offheap.OffHeapServerStore$AggregateIterator.hasNext(OffHeapServerStore.java:410)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.iteratorBatch(ClusterTierActiveEntity.java:616)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeServerStoreOperation(ClusterTierActiveEntity.java:518)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActiveInternal(ClusterTierActiveEntity.java:329)
at org.terracotta.client.message.tracker.OOOMessageHandlerImpl.invoke(OOOMessageHandlerImpl.java:79)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:317)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:134)
at com.tc.objectserver.entity.ManagedEntityImpl.performAction(ManagedEntityImpl.java:879)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$invoke$12(ManagedEntityImpl.java:591)
at java.util.Optional.ifPresent(Optional.java:159)
at com.tc.objectserver.entity.ManagedEntityImpl.invoke(ManagedEntityImpl.java:591)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$processInvokeRequest$5(ManagedEntityImpl.java:376)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1311)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1247)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.invoke(RequestProcessor.java:182)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.run(RequestProcessor.java:172)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:28)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:24)
at com.tc.async.impl.StageImpl.lambda$null$1(StageImpl.java:115)
at com.tc.async.impl.MonitoringEventCreator.lambda$createEvent$0(MonitoringEventCreator.java:77)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:307)

Terracotta is configured with a single off-heap resource:

<ohr:offheap-resources>
<ohr:resource name="offheap" unit="MB">2048</ohr:resource>
</ohr:offheap-resources>

and the cache being exercised:

<ehcache:cache alias="mycaache">
<ehcache:key-type copier="org.ehcache.impl.copy.SerializingCopier">java.lang.String</ehcache:key-type>
<ehcache:value-type copier="org.ehcache.impl.copy.SerializingCopier">MyCacheValue</ehcache:value-type>
<ehcache:expiry>
  <ehcache:tti unit="days">1</ehcache:tti>
</ehcache:expiry>

<ehcache:resources>
  <ehcache:heap unit="MB">100</ehcache:heap>
  <ehcache:offheap unit="MB">150</ehcache:offheap>
  <terracotta:clustered-dedicated unit="MB">200</terracotta:clustered-dedicated>
</ehcache:resources>
<terracotta:clustered-store consistency="strong"/>

</ehcache:cache>

The text was updated successfully, but these errors were encountered:

chrisdennis · 2019-12-09T20:27:38Z

@iainstarks I can certainly see the bug that caused the initial error. The second error is 'harmless' although it may be a symptom of others things happening. I'm not fully confident that the bug that caused the initial error is also the cause of the subsequent exceptions however.

If you could describe more about how you use the cache that might help. In particular:

What operations do you perform against the cache.
What are the keys and values: types, sizes, etc.

If you could upload a full set of logs somewhere so I can look through them that might be useful too. In the meantime I can build you a copy of Ehcache 3 with this first fix applied, and we can cross out fingers that I'm missing something and this does fix both problems.

iainstarks · 2019-12-09T22:04:31Z

Hi Chris,
Thanks for the response.
The cache holds some location info, so the key is a String and the value is a Pojo with the following content:public class LocationCheck implements Serializable {
/**
*
*/
private static final long serialVersionUID = 3033653990506158489L;
private final LocationCheckType type;
private final String alias;
private final String region;
private final long lastChecked;
private final String result;
private final String lat;
private final String lng;
private final long distanceInMiles;
private final boolean passed;
private final int delay;
private final StatusInfo statusInfo;

LocationCheckType is an enum
StatusInfo a wrapper around a List of ErrorInfo - which a class with two Strings

The list will usually be empty or have a single entry.
Not very big.

I think a log will be small enough to email. I'll try that.

iainstarks · 2019-12-09T22:32:32Z

Drop me an email to iainstarks@gmail.com and I'll send you the logs.

As for operations, we do a put, get a few times and then most likely a remove. We might repeat the put if the info changes. It has a tti of one day.

I would guess there would be a few thousand entries.

There are a few other caches, but I would think that one is the likely culprit.

chrisdennis · 2019-12-11T21:31:38Z

I think I know what the bigger issue is and have a fix lined up. More accurately I've found a real issue and it might be the cause of your bug. Do you have a testing environment we can try a build out in to see if this fixes the issue for you?

iainstarks · 2019-12-12T08:31:42Z

Replied by email, but just to re-iterate: no this issue has only appeared for us on a production platform. We've tried fairly hard to reproduce it elsewhere, with load tests, and general testing and it has also not occurred on different production environments. If you have some pointers as to what might cause it and how we might reproduce then that would be great. I'd really like to be able to do that so we could have some confidence that the issue is fixed and won't happen again. We're a little backed into a corner otherwise as you can imagine.

chrisdennis self-assigned this Dec 9, 2019

chrisdennis added the bug label Dec 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointer #64

NullPointer #64

iainstarks commented Dec 7, 2019 •

edited by chrisdennis

Loading

chrisdennis commented Dec 9, 2019

iainstarks commented Dec 9, 2019

iainstarks commented Dec 9, 2019

chrisdennis commented Dec 11, 2019

iainstarks commented Dec 12, 2019

NullPointer #64

NullPointer #64

Comments

iainstarks commented Dec 7, 2019 • edited by chrisdennis Loading

chrisdennis commented Dec 9, 2019

iainstarks commented Dec 9, 2019

iainstarks commented Dec 9, 2019

chrisdennis commented Dec 11, 2019

iainstarks commented Dec 12, 2019

iainstarks commented Dec 7, 2019 •

edited by chrisdennis

Loading