Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointer #64

Open
iainstarks opened this issue Dec 7, 2019 · 5 comments
Open

NullPointer #64

iainstarks opened this issue Dec 7, 2019 · 5 comments
Assignees
Labels

Comments

@iainstarks
Copy link

iainstarks commented Dec 7, 2019

We are using clutered ehcache 3.8.1 and after some time ( about 6 hours and under some load )
we see the exception below.
It appears to then lose cache entries.
The exception then repeats roughly every second.

Prior to that there was a single warning - which is also pasted below some 15 mins before the repeating error.
Earlier still there was a single ERROR, again I've pasted this below.

Any suggestions of what the cause might be greatly appreciated.

Repeating error:

2019-12-06 22:41:55,843 [WorkerThread(request_processor_stage, 2, 2)] ERROR org.ehcache.clustered.server.store.ClusterTierActiveEntity - Unexpected exception raised during operation: APPEND
java.lang.NullPointerException: null
at org.terracotta.offheapstore.paging.OffHeapStorageArea.readInt(OffHeapStorageArea.java:173)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.isHead(OffHeapChainStorageEngine.java:776)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.findHead(OffHeapChainStorageEngine.java:769)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.access$1300(OffHeapChainStorageEngine.java:42)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$StorageOwner.evictAtAddress(OffHeapChainStorageEngine.java:783)
at org.terracotta.offheapstore.paging.OffHeapStorageArea.release(OffHeapStorageArea.java:579)
at org.terracotta.offheapstore.paging.OffHeapStorageArea.shrink(OffHeapStorageArea.java:681)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.shrink(OffHeapChainStorageEngine.java:340)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap.shrink(AbstractLockedOffHeapHashMap.java:500)
at org.ehcache.clustered.server.offheap.OffHeapChainMap.shrink(OffHeapChainMap.java:345)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.tryShrinkOthers(OffHeapServerStore.java:242)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.handleOversizeMappingException(OffHeapServerStore.java:212)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.consumeOversizeMappingException(OffHeapServerStore.java:196)
at org.ehcache.clustered.server.offheap.OffHeapServerStore.append(OffHeapServerStore.java:124)
at org.ehcache.clustered.server.ServerStoreImpl.append(ServerStoreImpl.java:86)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeServerStoreOperation(ClusterTierActiveEntity.java:425)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActiveInternal(ClusterTierActiveEntity.java:329)
at org.terracotta.client.message.tracker.OOOMessageHandlerImpl.invoke(OOOMessageHandlerImpl.java:75)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:317)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:134)
at com.tc.objectserver.entity.ManagedEntityImpl.performAction(ManagedEntityImpl.java:879)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$invoke$12(ManagedEntityImpl.java:591)
at java.util.Optional.ifPresent(Optional.java:159)
at com.tc.objectserver.entity.ManagedEntityImpl.invoke(ManagedEntityImpl.java:591)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$processInvokeRequest$5(ManagedEntityImpl.java:376)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1311)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1247)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.invoke(RequestProcessor.java:182)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.run(RequestProcessor.java:172)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:28)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:24)
at com.tc.async.impl.StageImpl.lambda$null$1(StageImpl.java:115)
at com.tc.async.impl.MonitoringEventCreator.lambda$createEvent$0(MonitoringEventCreator.java:77)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:307)

Warning:

2019-12-06 22:26:29,634 [WorkerThread(request_processor_stage, 3, 3)] WARN org.terracotta.offheapstore.OffHeapHashMap - Expanding reprobe sequence from 512 slots to 1024 slots [load-factor=0.48480225]

Single ERROR:

2019-12-06 04:53:40,671 [WorkerThread(request_processor_stage, 4, 4)] ERROR org.ehcache.clustered.server.store.ClusterTierActiveEntity - Unexpected exception raised during operation: ITERATOR_OPEN
java.lang.NullPointerException: null
at org.terracotta.offheapstore.paging.OffHeapStorageArea.readInt(OffHeapStorageArea.java:173)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.readElementLength(OffHeapChainStorageEngine.java:162)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine.access$700(OffHeapChainStorageEngine.java:42)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$AttachedInternalChain.readElementBuffer(OffHeapChainStorageEngine.java:687)
at org.ehcache.clustered.server.offheap.OffHeapChainStorageEngine$AttachedInternalChain.detach(OffHeapChainStorageEngine.java:473)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.create(OffHeapChainMap.java:368)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.create(OffHeapChainMap.java:362)
at org.terracotta.offheapstore.OffHeapHashMap$HashIterator.(OffHeapHashMap.java:1390)
at org.terracotta.offheapstore.OffHeapHashMap$EntryIterator.(OffHeapHashMap.java:1525)
at org.terracotta.offheapstore.AbstractLockedOffHeapHashMap$LockedEntryIterator.(AbstractLockedOffHeapHashMap.java:385)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap$1.(OffHeapChainMap.java:362)
at org.ehcache.clustered.server.offheap.OffHeapChainMap$HeadMap.detachedEntryIterator(OffHeapChainMap.java:362)
at org.ehcache.clustered.server.offheap.OffHeapChainMap.iterator(OffHeapChainMap.java:243)
at org.ehcache.clustered.server.offheap.OffHeapServerStore$1.getNextIterator(OffHeapServerStore.java:374)
at org.ehcache.clustered.server.offheap.OffHeapServerStore$AggregateIterator.hasNext(OffHeapServerStore.java:410)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.iteratorBatch(ClusterTierActiveEntity.java:616)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeServerStoreOperation(ClusterTierActiveEntity.java:518)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActiveInternal(ClusterTierActiveEntity.java:329)
at org.terracotta.client.message.tracker.OOOMessageHandlerImpl.invoke(OOOMessageHandlerImpl.java:79)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:317)
at org.ehcache.clustered.server.store.ClusterTierActiveEntity.invokeActive(ClusterTierActiveEntity.java:134)
at com.tc.objectserver.entity.ManagedEntityImpl.performAction(ManagedEntityImpl.java:879)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$invoke$12(ManagedEntityImpl.java:591)
at java.util.Optional.ifPresent(Optional.java:159)
at com.tc.objectserver.entity.ManagedEntityImpl.invoke(ManagedEntityImpl.java:591)
at com.tc.objectserver.entity.ManagedEntityImpl.lambda$processInvokeRequest$5(ManagedEntityImpl.java:376)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1311)
at com.tc.objectserver.entity.ManagedEntityImpl$SchedulingRunnable.accept(ManagedEntityImpl.java:1247)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.invoke(RequestProcessor.java:182)
at com.tc.objectserver.entity.RequestProcessor$EntityRequest.run(RequestProcessor.java:172)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:28)
at com.tc.objectserver.entity.RequestProcessorHandler.handleEvent(RequestProcessorHandler.java:24)
at com.tc.async.impl.StageImpl.lambda$null$1(StageImpl.java:115)
at com.tc.async.impl.MonitoringEventCreator.lambda$createEvent$0(MonitoringEventCreator.java:77)
at com.tc.async.impl.StageImpl$WorkerThread.run(StageImpl.java:307)

Terracotta is configured with a single off-heap resource:

<ohr:offheap-resources>
<ohr:resource name="offheap" unit="MB">2048</ohr:resource>
</ohr:offheap-resources>

and the cache being exercised:

<ehcache:cache alias="mycaache">
<ehcache:key-type copier="org.ehcache.impl.copy.SerializingCopier">java.lang.String</ehcache:key-type>
<ehcache:value-type copier="org.ehcache.impl.copy.SerializingCopier">MyCacheValue</ehcache:value-type>
<ehcache:expiry>
  <ehcache:tti unit="days">1</ehcache:tti>
</ehcache:expiry>

<ehcache:resources>
  <ehcache:heap unit="MB">100</ehcache:heap>
  <ehcache:offheap unit="MB">150</ehcache:offheap>
  <terracotta:clustered-dedicated unit="MB">200</terracotta:clustered-dedicated>
</ehcache:resources>
<terracotta:clustered-store consistency="strong"/>

</ehcache:cache>
@chrisdennis chrisdennis self-assigned this Dec 9, 2019
@chrisdennis chrisdennis added the bug label Dec 9, 2019
@chrisdennis
Copy link
Member

@iainstarks I can certainly see the bug that caused the initial error. The second error is 'harmless' although it may be a symptom of others things happening. I'm not fully confident that the bug that caused the initial error is also the cause of the subsequent exceptions however.

If you could describe more about how you use the cache that might help. In particular:

  • What operations do you perform against the cache.
  • What are the keys and values: types, sizes, etc.

If you could upload a full set of logs somewhere so I can look through them that might be useful too. In the meantime I can build you a copy of Ehcache 3 with this first fix applied, and we can cross out fingers that I'm missing something and this does fix both problems.

@iainstarks
Copy link
Author

Hi Chris,
Thanks for the response.
The cache holds some location info, so the key is a String and the value is a Pojo with the following content:public class LocationCheck implements Serializable {
/**
*
*/
private static final long serialVersionUID = 3033653990506158489L;
private final LocationCheckType type;
private final String alias;
private final String region;
private final long lastChecked;
private final String result;
private final String lat;
private final String lng;
private final long distanceInMiles;
private final boolean passed;
private final int delay;
private final StatusInfo statusInfo;

LocationCheckType is an enum
StatusInfo a wrapper around a List of ErrorInfo - which a class with two Strings

The list will usually be empty or have a single entry.
Not very big.

I think a log will be small enough to email. I'll try that.

@iainstarks
Copy link
Author

Drop me an email to iainstarks@gmail.com and I'll send you the logs.

As for operations, we do a put, get a few times and then most likely a remove. We might repeat the put if the info changes. It has a tti of one day.

I would guess there would be a few thousand entries.

There are a few other caches, but I would think that one is the likely culprit.

@chrisdennis
Copy link
Member

I think I know what the bigger issue is and have a fix lined up. More accurately I've found a real issue and it might be the cause of your bug. Do you have a testing environment we can try a build out in to see if this fixes the issue for you?

@iainstarks
Copy link
Author

Replied by email, but just to re-iterate: no this issue has only appeared for us on a production platform. We've tried fairly hard to reproduce it elsewhere, with load tests, and general testing and it has also not occurred on different production environments. If you have some pointers as to what might cause it and how we might reproduce then that would be great. I'd really like to be able to do that so we could have some confidence that the issue is fixed and won't happen again. We're a little backed into a corner otherwise as you can imagine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants